Older blog entries for mslicker (starting at number 30)

15 Sep 2002 (updated 16 Sep 2002 at 03:10 UTC) »
mbp, Thanks for your last couple of posts. I am reminded that Computer Science is much closer to Physics than Mathematics. Machines are bound by physical reality, and while mostly under our control, are subject to unexpected events which makes "proof" unattainable.

The mathematical proof, is still a useful techinique in eliminating most expected bugs. If we prove that a program computes the gcd, we can have little doubt of it's function on a real machine, assuming the input integers are within the machines limits.

raph, writes:

I believe that it is possible to design programming languages and runtimes to be much, much simpler.
It's already been done, it's called colorForth.

Compatibility with existing designs is also important. Internal consistency can be proved, but compatibility can only be tested. This principle extends to file formats, programming languages, instruction sets, compression algorithms, communication protocols, file systems, all manner of I/O techniques, API's, etc. In other words, most of the things you care about when building a real system.
GNU is faith based computing. There is no way any one person check all these things, not even close. You have trust the work of others to a high degree, and also make broad assumptions in creating new work.

--- Update

graydon, Actually that was a quote of raph. My apology for the confusion, my quotation style was ambiguous.

response to tk:

Forth

It is probably the nature of Forth and also the excelent design of colorForth that I don't worry about standards and compatability so much. For other things, like Unix, as you say the cost may be huge.

In Forth the main thing that can foul things up is the semantics of the operators. As I've pointed out previously, the operators of colorForth are quite evolved. The block format is unlikely to change, it was designed to be extensible, so embeding different types of information in source code is entirely possible. I think this idea is quite neat, you could embed a string from another character encoding, or any other custom data. You could even include the code which extends the compiler inside your application, so there is no dependency.

So short term anoyances are unlikely with colorForth. In the long term, if the system has really changed so much, it is likely the hardware has changed as well. In this case your code probably isn't relevant. It might be best to start from scratch. I've noticed the changes in Chuck's Forth occur with changes in the hardware. The last 30 years have seen many hardware changes. Forth is considered the interface to the machine, so it is quite understandable for the Forth to change as well.

Portability

colorForth ignores portability with the result of extreme simplicity. Things are generally easier to write ignoring portability. This comes at the cost of the effort to port the code to different platforms. If the number of platforms is small, the cost is also small. Unix/C in constrast has a gigantic portability apperatus. The C language has complex type system, this probably helps portabillity to a degree, but does not really make C much more portable then macro-less colorForth. To accomidate for C's pitfalls there is also build systems, conditional compilation, and portability layers, more complexity. Furthermore, there is software compatability. C programs are often made compatible with multiple versions of Unix and sometimes Windows, even more complexity. The result is clear. Unix/C programs trade a great amount of simplicity for software/hardware compatability. Now the question is, which way is a greater burden on the programmers? From my experience it has been Unix/C way. Most code I don't even bother looking at, the cost of changing it is much greater than the benifit. On the other hand colorForth has not yet aquired what I consider basic functionality (networking, web, email). It will be intersting to see how well the colorForth method scales. I have a good amount of confidence in it at least.

Executing tree structures

On the pentium, I don't know how good that techinique would work. I'm quite wary of branches on the pentium in my own code. I had an oportunity to use this techinique recently, building a tree structure to decode a huffman coded stream. I decided not to. Jeff Fox used this techinique on a stack machine with good results, and aparently Phil Koopman did too, from the description in his book. When I get around to it, I'd like to read his thesis, which deals with the same subject.

Forth and standardization

tk, Language standardization is not for the benifit of coders. It is completely for marketing sake. The computing industry wants replacable parts. The last the thing they want is to be dependent on the talent of individuals. If you program in Forth and you leave, they believe replacing you with a "Forth programmer" will be just as good. But this is only an illusion, good programmers can program in any language, and can pick up new ones quickly. Some Forths, like colorForth, could be picked up within a day given proper training, even for programmers with no prior Forth experience. Language standardization only serves to gloss over differences of programmers and languages.

The Forth standard, and probably all language standards, was made to apeal to this industry. The ironic thing is that now that Forth has a standard it probably used less overall, and when used lacks many of the benefits of the original.

While I make no claims what real Forth is. colorForth is the one the Forth that got my attention. Previously, I did not have much interest in Forth, even though Forths are readily available for GNU. This makes me believe Forth is not just a superficial language, a syntax, but there is much more. There is a philosophy and also an aesthetic for simplicity and elegance. The virtual machine, that was avoided in the ANSI spec, is the embodiment of Forth machine philosphy. These machines are not only ideal for the execution of Forth, but show advantages for Rule-based systems and functional programming. colorForth seems well in keeping with the spirit and philosophy of Forth. It is encouraging that the lastest effort from the creator is, after many years, colorForth, and not "Visual Forth.NET Development Studio 2000 Professional Edition".

Formal methods and Dijkstra

I read through many of Dijkstra's papers recently. I have to agree with much of what he has said.

The skills to write good mathematical proofs and good programs are the same. And also, from experience, my math classes have been far more valuable than my computer science classes (with some exceptions). He said computer science students should be diciplined in mathematics and I think he is right.

Also he has said some things about elegance and simplicity. In math generally the more elegant and simple proofs are the ones that are taught. One math teacher of mine would give solutions. Somtimes people would turn in ten pages of proofs for an assignment. When he handed out his solutions the proofs fit on exactly one page. Programming problems also admit many solutions, and we should try to find the most elegant simple method of solving them.

Formal methods could be useful in programming, but not as a mechanical device to automatically check the correctness of your program. The key I think is developing highley modular programs. It is easy to check the correctness of a short function, you can even do it mentally. When functions get long, that is when you must get out the pencil and paper and follow each step.

Also the simpler a program is, the easier to check it's correctness. Often the problem we are addressing is one of our own choosing. Redefining the problem can often lead to simpler programs, and ones which are much easier to verify. This is a tremendous advantage which is often ignored in writing programs.

tk

Glad to stop. Our debate was turning more into an argument. I like debates, but not arguments.

Also I like to say that I don't care what programming language people use. I would much rather create than criticize. My remarks on C are mostly prompted by current attempts to parse it, which have taken far more work than I would like.

tk I await you remarks reguarding colorForth.

You seem to use whatever version of C suits you argument. K&R C and standard Cs do not have integration of machine code. Therefore languages like GNU C must implement them as a hack extention. This is not what I call clean integration. colorForth allows machine language integration by design. This is the whole purpose of the colorForth macros.

On compactness, fine let us choose a implementation then compare. I choose colorForth, I await you choice of C implementation.

On interactive development, I didn't state that colorForth has a debugger. Interactivety is achieved through a extremely tight edit/compile/test cycle, and entry points to every word.

Where did I seperate a computer language from it's implmentation? Look at ObjectiveCaml for a typeded infix language which is much cleaner than C, and even beats C on occasion in performace.

Please show me the abuguity in the colorForth specification. Are you comparing this to C? Many paragraphs of the C specification say the handling of a certain construct is implementation defined, how much more ambiguous can you get?

I have some advice for you tk, try implementing a parser for the C language. You said this was "quite tractable", lets see it. I think if you did this, you might actually understand how bogus the design of the C language is. You might also understand my abhorrence for it.

For me, I try to avoid design disasters where I can. I try to avoid projects which have such a great inertia. I like software that is small, fast, and nimble; changable. Software should be built to last, but not built to stay. Free software has many examples of software built to stay. Mozilla, gcc, Linux, Gnome, KDE, are all built to stay, if they last it is not because of design elegance but because they are unmovable.

My quote of the Tao:

In pursuit of knowledge,
every day something is added.
In the practice of the Tao,
every day something is dropped.
Less and less do you need to force things,
until finally you arrive at non-action.
When nothing is done,
nothing is left undone.
tk You're right, Forth has not stabalized. It does change, it is the nature of the language and the philosophy of the language to change and adapt.

I wouldn't say that Chuck Moore is reinventing the language. The basics always stay, definitions and stacks. The operators have evolved from his chip design experience over the last 20 years. The use of color is a new innovation, a simplification of previous notation. Blocks have been around Forth for quite while. There are a lot of ideas and innovations in the implementation, but it could well be implemented much differently without change from the user's perpective.

We clearly differ on the philosophy of computer languages. To me, having so many computer languages kept alive just means a much greater maintenece cost. If there are better methods people should be using them. A legacy language, is just one which new software should not be written in.

C is known primarily as a systems language. Compared to Forth it is a very poor systems language. Forth can interface hardware much more directly, more compactly, and allows clean integration of machine language code. It also features interactive development, which is ideal for debuging hardware device code.

C as an application language is beaten by many languages including Forth. C provides you with one set of abstractions with a clumsy syntax. As a result, higher level operations are very clumsy compared to languages which support these operations directly or extensible languages which allow you to add these operations in an integrated way.

C as an Algol family language seems poor. I'm not up on Algol languages, but I would be very suprised if C has not been surpassed. The fact that it takes two languages for C for what should easily be accomplished in one, makes C an easy target for improvement.

There are some cases where writing new C code makes sense. The most usable free software systems and many commercial systems are C based. When writing code for these systems, C is almost always available, and many libraries are written in C.

New systems should avoid the use of C, as clearly better methods exist.

Of course this is only my perpective. I have no control over what people use. Language choice is mostly guided by popularity anyway, not techinical merit.

C vs. colorForth

tk, I wasn't making a comparison, just pointing out that C is very complex. You brought colorForth into comparison, which I'm happy to discuss.

First on the maturity of the languages. You portray C as a stabilized language, and colorForth as a completely new language. This is not true. For one, Forth predates both B and C. Just as you can track the evolution of the C language from BCPL to B and to C and then later Cs. The same can be done for Forth. The main difference in the evolution is in the philosophy. From B to C, the language has accumulated much complexity in a short period. Later Cs just agglomerate new features over time, this is nature of the standardization process, features start as extensions and gradually work their way into the language. Forth, in contrast, removes features, adds features, simplifies, it evolves in a directed way with the changing nature of machines and the tasks they perform.

colorForth is the latest step on the evolution of Forth. It is in keeping with the principals of Forth from its inception. This is important to note, since there are other languages which go by the name of "Forth", which don't at all keep with the principals. In technical terms, they are a branch or derivation, but not within the true line of the language.

colorForth compares quite well with B in terms of simplicity too. I believe one reason is syntax. B, C and pretty much all infix languages sacrifice a great amount of simplicity for familiarity. Whether for popularity, or for the designers preference, infix is large drawback in programming language design.

Now, onto your specific comparisons:

  • Locals

    For locals it is true that colorForth locals are unnamed stack values. That is actually an important feature of the language. Locals are implicit, this offers the following advantages:

    • You reduce the syntax required of the language, makes the compiler simpler, makes the language easier to learn.
    • You increase the terseness of the language, makes the definitions shorter and clearer.
    • A well kept stack is optimal for execution on stack machines, to a lesser degree on register machines. Complex optimization logic is simply not needed for compilation, even on register machines.
    • colorForth always keeps a one to one correspondence between source and machine code. This makes compilation and execution behavior of code you write predictable. Locals would break this correspondence.

    The disadvantage, you pointed out. You have to keep track of stack. This is a learned skill, some programmers are very set in their ways and won't make the effort. For new programmers, I've heard they take to it quite naturally.

    Programs are typically written in a declarative style. Definitions are usually short, one or two lines. The idea is that though you have to keep track of the stack, you usually don't have to keep track of very much. Chuck Moore recommended two parameters be passed to a word on average.

    I should point out colorForth doesn't completely eliminate variables. State variables exist, these are similar to global variables of other languages, except they live inside the source code, and when you save, the values get saved too.

    One program I've seen sets an interrupt for the PIC and increments a variable in the source 18.2 times per second. Viewing the source in the editor will show the source changing, incrementing the variable.

  • Syntax

    Your comparison of operators/syntax is not correct. B and C have a structured BNF syntax, and complex compile/run time semantics. B is described in roughly 17k text (from "Syntax" until "8.0"), the latest C far more. colorForth has linear syntax and simple compile/run time semantics, described in about 3k text. Describing an equivalent set of operators would require little more, since these built on the base semantics.

    The library comparison is not useful, since they do different things. B is stream/file/unix oriented. colorForth is interactive graphics application oriented. Also some the words you include are colorForth applications like the editor (e, edit), block management (erase, copy), interpreter (accept), booting the system(boot, warm). Some of those aren't accessible outside the kernel (pack), some don't exist (font, clri). Some of the important ones, which would make a useful comparison (for operators only), are defined in colorForth source code.

  • Documentation

    The documentation that is lacking is a word list, actually promised by Chuck Moore on this page. In the mean time, the community has pitched in and created a page documenting known words.

Although your comparison wasn't specifically with the C language, Chuck Moore has a critique/comparison of the C language with colorForth. I agree with the points he has made.

tk, Language is not just syntax, but also semantics. Further, there is concrete syntax and then there is abstract syntax. Concrete syntax only serves a mechanical purpose, to parse a language correctly and efficiently. Abstract syntax is the real structure of a language. The C grammar only gives you the concrete syntax.

To build the abstract syntax, you must know the real structure of the language. It remains a fact, that most people programming in the C language (including me) only a have a superficial understanding of the language. We build up our own assumptions, and usually this is enough to produce working programs. Often our assumptions are invalidated, producing incorrect programs. It is not suprising to see how much difficulty the C language has caused. How many C based free software are actually correct upon the first writing? Look at all the projects that are bug ridden, and constanly being patched, and then look at what language they are written in. If it is not the C language, it will be one of almost equal complexity.

In writing software that analyzes software written in the C language, I see no way other than absorbing every detail in the spec. It might not all be relavent, once I see what the language really is, then I can make decisions. Also I can make assumptions which simplify things greatly. The greatest one, is that the program is correct acording to the C language syntax and semantics. I only need to worry about correct programs, and exit when the slightest error occurs.

Any way, this program is close to done. Does anyone have experience selling free software? I only want to sell it once, then people can modify it as they please. I think maybe a price reserve would be the way to go, with perhaps some sample output.

mpr, If I knew specifically what I wanted, a small hack might do. I'm not too skilled with Unix, so small hacks are probably out of my reach. Language analysis I know very well, so this seems like the right aproach for me.

Once I have the language in an abstract form, then I can do sofisticated transformations. Probably some experimentation is required to figure out exactly what I want to do with the information once parsed. I have several ideas of what makes the source hard to read and how I can alleviate.

If sucessful, this will be a useful tool to mine C source code, not just the Linux kernel.

To me, C is very much a legacy language. It should have left behind long ago, or perhaps should have never been created.

tk, I've read a lot of the spec. Pretty much every detail is required if you want to really make the most use of the information you are parsing.

C arcana

In the process of writing a parser, I found some little known (or at least little used) facts.

One interesting one, the following two lines mean the same thing syntactically.

<:  :> <%  %> %: %:%:
[   ]  {   }  #  ##
Using this fact, the following is perfectly valid C:
%:include <stdio.h>

int main () <% int a<::> = <%1, 2, 3%>;

return 0; %>

Linux deobfuscator

This is comming along. tk said writing a C parser is "quite tractable". That is quite an understatement. The language is itself described in no less than 160 pages of consise ANSI verbiage. Add to this the GNU C extentions, which there is no small number.

And I am now quite used to a language which is described in one page. So this quite a lot of things to keep strait.

To bad there is not more interest in this. Probably mostly of interest to people creating systems of which there are few.

Might be of use to people who want to get involved with linux too. However, I don't know how much information I will keep around which would be required for someone to get involved with the development.

I expect understanding a part of linux will take %10 of effort that it would take just using a editor. Most of this will just be in reducing the amount you need to read, some additional gains will be in locating information quickly. I think it would be good to put things in pop up windows. Then, you can build a context for understanding a function, by viewing the related data structures and functions.

21 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!