Older blog entries for zw (starting at number 3)

Took the day off. Rode around on my bicycle postering for the Douglas Hofstadter talk. He is speaking at UC Berkeley on the 21st of April, from 6 to 8 PM, in 2050 VLSB (Valley Life Sciences Building). If you're in the Bay Area, you should come. More info at the UC Cognitive Science Students' Association, which is sponsoring the talk.

My replacement burner arrived. I can cook dinner again without worrying about blowing myself up.

Update to previous entry: making the directive scanner consume newlines breaks absolutely everything. Have implemented a disgusting kluge (decrement line counter before calling the printer) instead. This works.

The ancient stove in this house has decided that its right front burner will not light unless a match is applied. If you just turn it on, it floods the area with gas which then explodes, setting fire to the house. Well, I assume that's what would happen, so far I have managed to catch it in time.

A few hours' driving around in Oakland found me an appliance store with a Wise Old Mechanic who managed to locate and order me a replacement burner for only $50. I was tempted to buy a whole new stove, but I doubt the landlord would appreciate it.

Today I reworked the generation of linemarkers. The output of CPP has lines in it that look like

# 12 "foobar.h"

that tell the compiler where each chunk of the preprocessed file came from. If you don't intend to generate a preprocessed file, these are useless - you can grab the info straight from CPP's data structures. But they are generated deep down in the guts of the preprocessor and you can't get rid of them...

...well, now you can. The "reader" library interface doesn't generate them anymore, and there's a new "printer" interface that sticks them in right before output.

Structurally, this is a good deal cleaner than what we had. It works great too, except that it gets all the line numbers wrong. This is not really the fault of the "printer", but a bad interaction with a whole different area of the code.

There's a special internal routine to scan directive lines. Among other things, it refuses to scan past the end of a line - except what it really does is refuse to consume the end of a line. The "printer" has to emit a linemarker at the beginning of each #included file. It will not get a chance to do so unless it's invoked before the #include processor returns. But at that point, the newline ending the #include line has not been consumed. Therefore that newline will be counted twice.

It's even worse than that - the only place the "printer" can get control can't distinguish between #include and anything else, which means every single directive line will be counted twice for line-numbering purposes.

The fix is to make the directive scanner consume newlines. That will be tricky. The various directive handlers count on being able to read that newline multiple times and not get messed up; works fine when it's never consumed, but if we want it to be consumed exactly once... harder.

So today I'm working on token lists. CPP has always been strictly textual, but C the language is based on tokens. Now CPP is going to have the same concept.

The basic idea is that we scan one line at a time and convert it to tokens. These are as close as possible to the C front end's concept of tokens. We try to make this have no context sensitivity, and we can do it except for directive lines. Then macro expansion and directive processing happens on the pretokenized line, which produces another line. Then we convert that back to text and feed it to the compiler. (Longer term, the converting back to text won't happen either, but one step at a time.)

But for today, all I implemented was the basic data structure and helper functions. The next step will be to wrap the existing lexer in this - that will hopefully happen by Wednesday. Then I will begin bashing on the macro expander.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!