Older blog entries for mslicker (starting at number 25)

tk You're right, Forth has not stabalized. It does change, it is the nature of the language and the philosophy of the language to change and adapt.

I wouldn't say that Chuck Moore is reinventing the language. The basics always stay, definitions and stacks. The operators have evolved from his chip design experience over the last 20 years. The use of color is a new innovation, a simplification of previous notation. Blocks have been around Forth for quite while. There are a lot of ideas and innovations in the implementation, but it could well be implemented much differently without change from the user's perpective.

We clearly differ on the philosophy of computer languages. To me, having so many computer languages kept alive just means a much greater maintenece cost. If there are better methods people should be using them. A legacy language, is just one which new software should not be written in.

C is known primarily as a systems language. Compared to Forth it is a very poor systems language. Forth can interface hardware much more directly, more compactly, and allows clean integration of machine language code. It also features interactive development, which is ideal for debuging hardware device code.

C as an application language is beaten by many languages including Forth. C provides you with one set of abstractions with a clumsy syntax. As a result, higher level operations are very clumsy compared to languages which support these operations directly or extensible languages which allow you to add these operations in an integrated way.

C as an Algol family language seems poor. I'm not up on Algol languages, but I would be very suprised if C has not been surpassed. The fact that it takes two languages for C for what should easily be accomplished in one, makes C an easy target for improvement.

There are some cases where writing new C code makes sense. The most usable free software systems and many commercial systems are C based. When writing code for these systems, C is almost always available, and many libraries are written in C.

New systems should avoid the use of C, as clearly better methods exist.

Of course this is only my perpective. I have no control over what people use. Language choice is mostly guided by popularity anyway, not techinical merit.

C vs. colorForth

tk, I wasn't making a comparison, just pointing out that C is very complex. You brought colorForth into comparison, which I'm happy to discuss.

First on the maturity of the languages. You portray C as a stabilized language, and colorForth as a completely new language. This is not true. For one, Forth predates both B and C. Just as you can track the evolution of the C language from BCPL to B and to C and then later Cs. The same can be done for Forth. The main difference in the evolution is in the philosophy. From B to C, the language has accumulated much complexity in a short period. Later Cs just agglomerate new features over time, this is nature of the standardization process, features start as extensions and gradually work their way into the language. Forth, in contrast, removes features, adds features, simplifies, it evolves in a directed way with the changing nature of machines and the tasks they perform.

colorForth is the latest step on the evolution of Forth. It is in keeping with the principals of Forth from its inception. This is important to note, since there are other languages which go by the name of "Forth", which don't at all keep with the principals. In technical terms, they are a branch or derivation, but not within the true line of the language.

colorForth compares quite well with B in terms of simplicity too. I believe one reason is syntax. B, C and pretty much all infix languages sacrifice a great amount of simplicity for familiarity. Whether for popularity, or for the designers preference, infix is large drawback in programming language design.

Now, onto your specific comparisons:

  • Locals

    For locals it is true that colorForth locals are unnamed stack values. That is actually an important feature of the language. Locals are implicit, this offers the following advantages:

    • You reduce the syntax required of the language, makes the compiler simpler, makes the language easier to learn.
    • You increase the terseness of the language, makes the definitions shorter and clearer.
    • A well kept stack is optimal for execution on stack machines, to a lesser degree on register machines. Complex optimization logic is simply not needed for compilation, even on register machines.
    • colorForth always keeps a one to one correspondence between source and machine code. This makes compilation and execution behavior of code you write predictable. Locals would break this correspondence.

    The disadvantage, you pointed out. You have to keep track of stack. This is a learned skill, some programmers are very set in their ways and won't make the effort. For new programmers, I've heard they take to it quite naturally.

    Programs are typically written in a declarative style. Definitions are usually short, one or two lines. The idea is that though you have to keep track of the stack, you usually don't have to keep track of very much. Chuck Moore recommended two parameters be passed to a word on average.

    I should point out colorForth doesn't completely eliminate variables. State variables exist, these are similar to global variables of other languages, except they live inside the source code, and when you save, the values get saved too.

    One program I've seen sets an interrupt for the PIC and increments a variable in the source 18.2 times per second. Viewing the source in the editor will show the source changing, incrementing the variable.

  • Syntax

    Your comparison of operators/syntax is not correct. B and C have a structured BNF syntax, and complex compile/run time semantics. B is described in roughly 17k text (from "Syntax" until "8.0"), the latest C far more. colorForth has linear syntax and simple compile/run time semantics, described in about 3k text. Describing an equivalent set of operators would require little more, since these built on the base semantics.

    The library comparison is not useful, since they do different things. B is stream/file/unix oriented. colorForth is interactive graphics application oriented. Also some the words you include are colorForth applications like the editor (e, edit), block management (erase, copy), interpreter (accept), booting the system(boot, warm). Some of those aren't accessible outside the kernel (pack), some don't exist (font, clri). Some of the important ones, which would make a useful comparison (for operators only), are defined in colorForth source code.

  • Documentation

    The documentation that is lacking is a word list, actually promised by Chuck Moore on this page. In the mean time, the community has pitched in and created a page documenting known words.

Although your comparison wasn't specifically with the C language, Chuck Moore has a critique/comparison of the C language with colorForth. I agree with the points he has made.

tk, Language is not just syntax, but also semantics. Further, there is concrete syntax and then there is abstract syntax. Concrete syntax only serves a mechanical purpose, to parse a language correctly and efficiently. Abstract syntax is the real structure of a language. The C grammar only gives you the concrete syntax.

To build the abstract syntax, you must know the real structure of the language. It remains a fact, that most people programming in the C language (including me) only a have a superficial understanding of the language. We build up our own assumptions, and usually this is enough to produce working programs. Often our assumptions are invalidated, producing incorrect programs. It is not suprising to see how much difficulty the C language has caused. How many C based free software are actually correct upon the first writing? Look at all the projects that are bug ridden, and constanly being patched, and then look at what language they are written in. If it is not the C language, it will be one of almost equal complexity.

In writing software that analyzes software written in the C language, I see no way other than absorbing every detail in the spec. It might not all be relavent, once I see what the language really is, then I can make decisions. Also I can make assumptions which simplify things greatly. The greatest one, is that the program is correct acording to the C language syntax and semantics. I only need to worry about correct programs, and exit when the slightest error occurs.

Any way, this program is close to done. Does anyone have experience selling free software? I only want to sell it once, then people can modify it as they please. I think maybe a price reserve would be the way to go, with perhaps some sample output.

mpr, If I knew specifically what I wanted, a small hack might do. I'm not too skilled with Unix, so small hacks are probably out of my reach. Language analysis I know very well, so this seems like the right aproach for me.

Once I have the language in an abstract form, then I can do sofisticated transformations. Probably some experimentation is required to figure out exactly what I want to do with the information once parsed. I have several ideas of what makes the source hard to read and how I can alleviate.

If sucessful, this will be a useful tool to mine C source code, not just the Linux kernel.

To me, C is very much a legacy language. It should have left behind long ago, or perhaps should have never been created.

tk, I've read a lot of the spec. Pretty much every detail is required if you want to really make the most use of the information you are parsing.

C arcana

In the process of writing a parser, I found some little known (or at least little used) facts.

One interesting one, the following two lines mean the same thing syntactically.

<:  :> <%  %> %: %:%:
[   ]  {   }  #  ##
Using this fact, the following is perfectly valid C:
%:include <stdio.h>

int main () <% int a<::> = <%1, 2, 3%>;

return 0; %>

Linux deobfuscator

This is comming along. tk said writing a C parser is "quite tractable". That is quite an understatement. The language is itself described in no less than 160 pages of consise ANSI verbiage. Add to this the GNU C extentions, which there is no small number.

And I am now quite used to a language which is described in one page. So this quite a lot of things to keep strait.

To bad there is not more interest in this. Probably mostly of interest to people creating systems of which there are few.

Might be of use to people who want to get involved with linux too. However, I don't know how much information I will keep around which would be required for someone to get involved with the development.

I expect understanding a part of linux will take %10 of effort that it would take just using a editor. Most of this will just be in reducing the amount you need to read, some additional gains will be in locating information quickly. I think it would be good to put things in pop up windows. Then, you can build a context for understanding a function, by viewing the related data structures and functions.

Posted my previous entry as an article at the request of tk.

17 Jul 2002 (updated 18 Jul 2002 at 00:56 UTC) »
Linux deobfuscation project

Reguardless of the particular merrits of Linux, a large body of device drivers has been written for it. So much so, that when designers of new operating systems embark on creating a system they often op for building on top Linux, despite the incredible drawbacks that come from this approach. Furthermore, for some devices, Linux may be the only reference for programming a device. This is due partly to the fact that manufacturers' support of "open source" means the development of a Linux driver, not releasing the specifications as they should have done.

Reading source can be quite difficult for all but the most seasoned Linux hackers. This is due to many factors. A certain coding style, ubiquitous comments, and the general clumsyness of the C language.

My idea, is to mechanically convert the Linux sources to a form that at once represents their true structure and more directly represents their semantics. A particular machine configuration will be assumed. Therefore preprocessing can be done in it's entirety. Comments and macros will be left as hypertext annotations. I think file structure should be completely broken down, leaving just a call graph (or forest). Types will will be completely striped and left as an annotation, with a similar graph representation.

Currently this only exists as an idea. If you are interrested in funding this work please contact me. I say this because it is not something easily done. It would require some time to do right, and that is perhaps the only thing preventing me from doing so.

--- Update Grammatical corrections.

tk wrote:
Is there a non-OO system architecture which can facilitate the creation of object-like interfaces for the entire system? This is exactly what I've been trying to figure out.

I can name three: Lisp, Postsript, and Forth. Lisp on Lisp Machines aparently had an object-like interface. There was the Postcript News windowing system, which facilitated object-like interfaces. Postscript is of course a special case of Forth, so such interfaces could easily be developed in Forth. I have even heard of Smalltalk like GUIs being developed from members of the forth community, although I've never seen these in person.

One common thread among Lisp and Forth, besides the simple syntax, is the extensibility of the language. This is the key feature which allows these languages to be applied easily to any problem.

The Postcript News system on the other hand, was far more isolated most of the facilities were built into the window server and were hence unchangable.

nymia, I noticed there is a lot of activity hacking the Xbox. Seems quite close to what you want, commodity hardware, PC like. And people seem to be making progress hacking the hardware.

tk, A bit belated. Just to note, that the user interface may be object-like, and user friendly, without the operating system necessarily being object-oriented.

Applications and processes are probably left over from the old time sharing systems, from which unix is derived. There is no reason to organize the system in this particular way, except prehaps lack of imagination on the part of modern os designers. Also the notion of an application has strong favor with proprietary software companies. Kind of a little fortress seperated from the rest of the computer, which they can control, and resell to you and indefinite number of times.

Can't say an OO operating system is worth it. There doesn't seem to be any concrete reason to organize the system in this way except for the designers preference. New systems certainly have a lot of potential if only the hardware barier can be broken through.

vsync

If anyone is familer with programming the vertical sync interrupt of video cards, please contact me. I'm having a bit of trouble. This is a more modern card, the Matrox G400.

raph, I have to ask, what problem are you trying to solve?

If you care about performance, you should know that the fastest display engines know specifically what can change, how it can change, what needs to be redrawn, the best way to redraw, ect.

Also this generic tree structure will of course add much bloat to any application willing to use it. An optimized representation will always beat it, sometimes by orders of magnitude. Creating such library is a sure way to encourage bloatware, and mediocrity.

I believe this line of thinking, and similar thinking, is perhaps the true source of bloat and complexity in todays software. If people would only address the problem at hand...

tk, Are we really doomed to Windows and Unix forever? Or is there a way out? I feel that latter is true. If didn't I'd probably give up programming. You must too, or else why would you work on Delitalk? colorForth, Chuck Moore, and others I believe offer the intellectual strenth to break through the bloatware stranglehold. Not to mention the C stranglehold, The language that has held so much back.

16 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!