Older blog entries for mslicker (starting at number 27)

Formal methods and Dijkstra

I read through many of Dijkstra's papers recently. I have to agree with much of what he has said.

The skills to write good mathematical proofs and good programs are the same. And also, from experience, my math classes have been far more valuable than my computer science classes (with some exceptions). He said computer science students should be diciplined in mathematics and I think he is right.

Also he has said some things about elegance and simplicity. In math generally the more elegant and simple proofs are the ones that are taught. One math teacher of mine would give solutions. Somtimes people would turn in ten pages of proofs for an assignment. When he handed out his solutions the proofs fit on exactly one page. Programming problems also admit many solutions, and we should try to find the most elegant simple method of solving them.

Formal methods could be useful in programming, but not as a mechanical device to automatically check the correctness of your program. The key I think is developing highley modular programs. It is easy to check the correctness of a short function, you can even do it mentally. When functions get long, that is when you must get out the pencil and paper and follow each step.

Also the simpler a program is, the easier to check it's correctness. Often the problem we are addressing is one of our own choosing. Redefining the problem can often lead to simpler programs, and ones which are much easier to verify. This is a tremendous advantage which is often ignored in writing programs.

tk

Glad to stop. Our debate was turning more into an argument. I like debates, but not arguments.

Also I like to say that I don't care what programming language people use. I would much rather create than criticize. My remarks on C are mostly prompted by current attempts to parse it, which have taken far more work than I would like.

tk I await you remarks reguarding colorForth.

You seem to use whatever version of C suits you argument. K&R C and standard Cs do not have integration of machine code. Therefore languages like GNU C must implement them as a hack extention. This is not what I call clean integration. colorForth allows machine language integration by design. This is the whole purpose of the colorForth macros.

On compactness, fine let us choose a implementation then compare. I choose colorForth, I await you choice of C implementation.

On interactive development, I didn't state that colorForth has a debugger. Interactivety is achieved through a extremely tight edit/compile/test cycle, and entry points to every word.

Where did I seperate a computer language from it's implmentation? Look at ObjectiveCaml for a typeded infix language which is much cleaner than C, and even beats C on occasion in performace.

Please show me the abuguity in the colorForth specification. Are you comparing this to C? Many paragraphs of the C specification say the handling of a certain construct is implementation defined, how much more ambiguous can you get?

I have some advice for you tk, try implementing a parser for the C language. You said this was "quite tractable", lets see it. I think if you did this, you might actually understand how bogus the design of the C language is. You might also understand my abhorrence for it.

For me, I try to avoid design disasters where I can. I try to avoid projects which have such a great inertia. I like software that is small, fast, and nimble; changable. Software should be built to last, but not built to stay. Free software has many examples of software built to stay. Mozilla, gcc, Linux, Gnome, KDE, are all built to stay, if they last it is not because of design elegance but because they are unmovable.

My quote of the Tao:

In pursuit of knowledge,
every day something is added.
In the practice of the Tao,
every day something is dropped.
Less and less do you need to force things,
until finally you arrive at non-action.
When nothing is done,
nothing is left undone.
tk You're right, Forth has not stabalized. It does change, it is the nature of the language and the philosophy of the language to change and adapt.

I wouldn't say that Chuck Moore is reinventing the language. The basics always stay, definitions and stacks. The operators have evolved from his chip design experience over the last 20 years. The use of color is a new innovation, a simplification of previous notation. Blocks have been around Forth for quite while. There are a lot of ideas and innovations in the implementation, but it could well be implemented much differently without change from the user's perpective.

We clearly differ on the philosophy of computer languages. To me, having so many computer languages kept alive just means a much greater maintenece cost. If there are better methods people should be using them. A legacy language, is just one which new software should not be written in.

C is known primarily as a systems language. Compared to Forth it is a very poor systems language. Forth can interface hardware much more directly, more compactly, and allows clean integration of machine language code. It also features interactive development, which is ideal for debuging hardware device code.

C as an application language is beaten by many languages including Forth. C provides you with one set of abstractions with a clumsy syntax. As a result, higher level operations are very clumsy compared to languages which support these operations directly or extensible languages which allow you to add these operations in an integrated way.

C as an Algol family language seems poor. I'm not up on Algol languages, but I would be very suprised if C has not been surpassed. The fact that it takes two languages for C for what should easily be accomplished in one, makes C an easy target for improvement.

There are some cases where writing new C code makes sense. The most usable free software systems and many commercial systems are C based. When writing code for these systems, C is almost always available, and many libraries are written in C.

New systems should avoid the use of C, as clearly better methods exist.

Of course this is only my perpective. I have no control over what people use. Language choice is mostly guided by popularity anyway, not techinical merit.

C vs. colorForth

tk, I wasn't making a comparison, just pointing out that C is very complex. You brought colorForth into comparison, which I'm happy to discuss.

First on the maturity of the languages. You portray C as a stabilized language, and colorForth as a completely new language. This is not true. For one, Forth predates both B and C. Just as you can track the evolution of the C language from BCPL to B and to C and then later Cs. The same can be done for Forth. The main difference in the evolution is in the philosophy. From B to C, the language has accumulated much complexity in a short period. Later Cs just agglomerate new features over time, this is nature of the standardization process, features start as extensions and gradually work their way into the language. Forth, in contrast, removes features, adds features, simplifies, it evolves in a directed way with the changing nature of machines and the tasks they perform.

colorForth is the latest step on the evolution of Forth. It is in keeping with the principals of Forth from its inception. This is important to note, since there are other languages which go by the name of "Forth", which don't at all keep with the principals. In technical terms, they are a branch or derivation, but not within the true line of the language.

colorForth compares quite well with B in terms of simplicity too. I believe one reason is syntax. B, C and pretty much all infix languages sacrifice a great amount of simplicity for familiarity. Whether for popularity, or for the designers preference, infix is large drawback in programming language design.

Now, onto your specific comparisons:

  • Locals

    For locals it is true that colorForth locals are unnamed stack values. That is actually an important feature of the language. Locals are implicit, this offers the following advantages:

    • You reduce the syntax required of the language, makes the compiler simpler, makes the language easier to learn.
    • You increase the terseness of the language, makes the definitions shorter and clearer.
    • A well kept stack is optimal for execution on stack machines, to a lesser degree on register machines. Complex optimization logic is simply not needed for compilation, even on register machines.
    • colorForth always keeps a one to one correspondence between source and machine code. This makes compilation and execution behavior of code you write predictable. Locals would break this correspondence.

    The disadvantage, you pointed out. You have to keep track of stack. This is a learned skill, some programmers are very set in their ways and won't make the effort. For new programmers, I've heard they take to it quite naturally.

    Programs are typically written in a declarative style. Definitions are usually short, one or two lines. The idea is that though you have to keep track of the stack, you usually don't have to keep track of very much. Chuck Moore recommended two parameters be passed to a word on average.

    I should point out colorForth doesn't completely eliminate variables. State variables exist, these are similar to global variables of other languages, except they live inside the source code, and when you save, the values get saved too.

    One program I've seen sets an interrupt for the PIC and increments a variable in the source 18.2 times per second. Viewing the source in the editor will show the source changing, incrementing the variable.

  • Syntax

    Your comparison of operators/syntax is not correct. B and C have a structured BNF syntax, and complex compile/run time semantics. B is described in roughly 17k text (from "Syntax" until "8.0"), the latest C far more. colorForth has linear syntax and simple compile/run time semantics, described in about 3k text. Describing an equivalent set of operators would require little more, since these built on the base semantics.

    The library comparison is not useful, since they do different things. B is stream/file/unix oriented. colorForth is interactive graphics application oriented. Also some the words you include are colorForth applications like the editor (e, edit), block management (erase, copy), interpreter (accept), booting the system(boot, warm). Some of those aren't accessible outside the kernel (pack), some don't exist (font, clri). Some of the important ones, which would make a useful comparison (for operators only), are defined in colorForth source code.

  • Documentation

    The documentation that is lacking is a word list, actually promised by Chuck Moore on this page. In the mean time, the community has pitched in and created a page documenting known words.

Although your comparison wasn't specifically with the C language, Chuck Moore has a critique/comparison of the C language with colorForth. I agree with the points he has made.

tk, Language is not just syntax, but also semantics. Further, there is concrete syntax and then there is abstract syntax. Concrete syntax only serves a mechanical purpose, to parse a language correctly and efficiently. Abstract syntax is the real structure of a language. The C grammar only gives you the concrete syntax.

To build the abstract syntax, you must know the real structure of the language. It remains a fact, that most people programming in the C language (including me) only a have a superficial understanding of the language. We build up our own assumptions, and usually this is enough to produce working programs. Often our assumptions are invalidated, producing incorrect programs. It is not suprising to see how much difficulty the C language has caused. How many C based free software are actually correct upon the first writing? Look at all the projects that are bug ridden, and constanly being patched, and then look at what language they are written in. If it is not the C language, it will be one of almost equal complexity.

In writing software that analyzes software written in the C language, I see no way other than absorbing every detail in the spec. It might not all be relavent, once I see what the language really is, then I can make decisions. Also I can make assumptions which simplify things greatly. The greatest one, is that the program is correct acording to the C language syntax and semantics. I only need to worry about correct programs, and exit when the slightest error occurs.

Any way, this program is close to done. Does anyone have experience selling free software? I only want to sell it once, then people can modify it as they please. I think maybe a price reserve would be the way to go, with perhaps some sample output.

mpr, If I knew specifically what I wanted, a small hack might do. I'm not too skilled with Unix, so small hacks are probably out of my reach. Language analysis I know very well, so this seems like the right aproach for me.

Once I have the language in an abstract form, then I can do sofisticated transformations. Probably some experimentation is required to figure out exactly what I want to do with the information once parsed. I have several ideas of what makes the source hard to read and how I can alleviate.

If sucessful, this will be a useful tool to mine C source code, not just the Linux kernel.

To me, C is very much a legacy language. It should have left behind long ago, or perhaps should have never been created.

tk, I've read a lot of the spec. Pretty much every detail is required if you want to really make the most use of the information you are parsing.

C arcana

In the process of writing a parser, I found some little known (or at least little used) facts.

One interesting one, the following two lines mean the same thing syntactically.

<:  :> <%  %> %: %:%:
[   ]  {   }  #  ##
Using this fact, the following is perfectly valid C:
%:include <stdio.h>

int main () <% int a<::> = <%1, 2, 3%>;

return 0; %>

Linux deobfuscator

This is comming along. tk said writing a C parser is "quite tractable". That is quite an understatement. The language is itself described in no less than 160 pages of consise ANSI verbiage. Add to this the GNU C extentions, which there is no small number.

And I am now quite used to a language which is described in one page. So this quite a lot of things to keep strait.

To bad there is not more interest in this. Probably mostly of interest to people creating systems of which there are few.

Might be of use to people who want to get involved with linux too. However, I don't know how much information I will keep around which would be required for someone to get involved with the development.

I expect understanding a part of linux will take %10 of effort that it would take just using a editor. Most of this will just be in reducing the amount you need to read, some additional gains will be in locating information quickly. I think it would be good to put things in pop up windows. Then, you can build a context for understanding a function, by viewing the related data structures and functions.

Posted my previous entry as an article at the request of tk.

17 Jul 2002 (updated 18 Jul 2002 at 00:56 UTC) »
Linux deobfuscation project

Reguardless of the particular merrits of Linux, a large body of device drivers has been written for it. So much so, that when designers of new operating systems embark on creating a system they often op for building on top Linux, despite the incredible drawbacks that come from this approach. Furthermore, for some devices, Linux may be the only reference for programming a device. This is due partly to the fact that manufacturers' support of "open source" means the development of a Linux driver, not releasing the specifications as they should have done.

Reading source can be quite difficult for all but the most seasoned Linux hackers. This is due to many factors. A certain coding style, ubiquitous comments, and the general clumsyness of the C language.

My idea, is to mechanically convert the Linux sources to a form that at once represents their true structure and more directly represents their semantics. A particular machine configuration will be assumed. Therefore preprocessing can be done in it's entirety. Comments and macros will be left as hypertext annotations. I think file structure should be completely broken down, leaving just a call graph (or forest). Types will will be completely striped and left as an annotation, with a similar graph representation.

Currently this only exists as an idea. If you are interrested in funding this work please contact me. I say this because it is not something easily done. It would require some time to do right, and that is perhaps the only thing preventing me from doing so.

--- Update Grammatical corrections.

tk wrote:
Is there a non-OO system architecture which can facilitate the creation of object-like interfaces for the entire system? This is exactly what I've been trying to figure out.

I can name three: Lisp, Postsript, and Forth. Lisp on Lisp Machines aparently had an object-like interface. There was the Postcript News windowing system, which facilitated object-like interfaces. Postscript is of course a special case of Forth, so such interfaces could easily be developed in Forth. I have even heard of Smalltalk like GUIs being developed from members of the forth community, although I've never seen these in person.

One common thread among Lisp and Forth, besides the simple syntax, is the extensibility of the language. This is the key feature which allows these languages to be applied easily to any problem.

The Postcript News system on the other hand, was far more isolated most of the facilities were built into the window server and were hence unchangable.

18 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!