2 Aug 2004 dsnopek   » (Journeyer)


Lately I have been doing a whole bunch of hacking on Pyml. I really want to get the low-level stuff handled so that it is vagely real-world usable. After that, I can hope for someone to even care about the userland API. Mainly, this effects the parser and compiler.

I started by working on the parser which had a few known problems with determining the various PIs properly. It was based on a single fairly complicated regular expression. This was great and mostly worked -- tweaking the regex ad infinitum could have fixed it. Unfortunately, it made acurately determining and maintaining the line numbers impossible. This is very important to ever having a Pyml line debugger. I tried a lex/yacc parser but settled on a quirky mode-based buffer thing. Not pretty but it gives me lots of control.

The initial Pyml compiler simply generated Python source code and passed it to the CPython compiler. It put everything on its own line since Python is very sensative to white space. This worked but, again, the line debugger! So I started attempting to join code that would be on the same line using the ';' symbol. I found some pretty esoteric Python syntax errors this way! They could be avoided by putting everything in a line continuation inside an exec() statement. Hey, this worked but created really ugly bytecode when disassembled using the "dis" module.

About here I started thinking, "Wow, this would be so easy if I could just generate by own bytecode." Embarking on a wild goose chase I explored compile.c, the "compiler" module and the ByteCodeHacks project. To summarize many hours of gimacing at the computer and writting on little peices of paper, this is a no go -- especially if we want to support many versions of Python portably.

So here is my solution: continue to generate python source and use ';' to connect lines. But I am going to use the "compiler" module to generate an AST to determine if the code fits the strange rules Python applies in this case.

One of these days, all this may result in some actual code...

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!