11 Aug 2000 whatever   » (Journeyer)

csed/cgrep was coming along quite well, until the last stages where it was taking far too much effort to convert the earlier stages of lexical parsings into more complicated structures like tables of variable types. There were problems with how to accomodate incremental changes in the source code being analysed and how to work out which caches should be updated, which buffers needed to be flushed, etc.

I hammered away at this for weeks, trying to come up with an insight into the problem that would simplify the accelerating tide of exceptions that were going into the parser with the addition of every new syntax element. Was there some elegant recursive algorithm I could use? Was it better to visualise cache elements as a list of states, a tree of states, or a nested structure of states? For every algorithm I could think of that satisfied the most common case, the remaining list of exceptions still accelerated exponentially with every new facility I added.

I just couldn't figure out how to add all the features I wanted, without reaching impossible levels of complicated code! I downloaded as many parsers as I could, and examined them. It appeared that it was a problem that hadn't been solved as they were all huge, complicated, and hairy as well. There were some very nice small ones (eg, Lua), but I really want my parser to comprehend full ANSI C with GNU extensions.

I was starting to think that perhaps I had bitten off more than I could chew, even though the goal is pretty simple - embed sed and grep inside a parser, with the ability to handle on the fly code changes. How could something so simple sounding be such a problem?

Since I wasn't getting anywhere with my attempts to save the code, I decided to throw it all away and start again. Without the blinkers imposed by the objective of "avoid rewriting! save the code!", I had this incredible few hours where I suddenly realised exactly what I was doing wrong and where I went wrong.

In my quest to keep things simple, I had over-simplified the core of my design to the point where it wasn't sufficient for the task. The rest of the program was difficult to write because it was trying to make up for a core deficit in the design.

By making the core design more complicated, the rest of the program was simplified so dramatically that the final amount of code I expect to write has been halved.

The learning process sucks when I just want to accomplish a task!

On another topic... thanks to darkewolf for certifying me! It's always nice to receive positive feedback. :)

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!