Advogato: Blog for braden

Shedding ANTLR

My word… was it really all the way back in 2001 that I first latched onto the idea of replacing OpenVRML’s ANTLR-based parser with one written using Spirit?

I guess I shouldn’t be so surprised; I was immediately taken with Spirit upon discovering it. Though back in 2001 Spirit still had a good deal of growing up to do; it’s come a long way since then. And given that Joel de Guzman and his cohorts are hard at work on Spirit 2, I get the idea that it will yet go a good deal farther. But I digress.

It’s the middle of 2007 and I’ve gotten serious about writing a VRML parser for OpenVRML using Spirit. Why now? Well, ANTLR certainly hasn’t gotten to be any less of an annoying dependency over the years. But another factor is that a new major version of ANTLR has been released (3.0). I think my efforts are better spent moving away from ANTLR entirely than on upgrading to the new version (which I understand to include nontrivial changes to the grammar format).

Now, I don’t want to come across as disliking ANTLR. It’s a really nice tool. In fact, it’s the nicest parser generator I’ve ever used. Functionality is very discoverable and I found, for the most part, the general behavior of ANTLR parsers to be very intuitive. But it has its downsides:

It’s a code generator, and thus it has the caveats that go with any code generation.
It’s a Java program, and thus it has the caveats that go with any Java program.
Even though it can generate code for a number of languages that are Not Java, the Not Java language backends are maintained by persons other than the primary author of ANTLR; thus, these languages wind up being second class citizens. (For example, ANTLR 3.0 has been released without C++ support since the maintainer of the C++ backend for ANTLR 2.x didn’t have time to port it to the new version.)

I actually started in earnest on this project at the end of last summer. I made a good deal of headway, getting as far as developing a good understanding of how to use Spirit’s stored_rule to create a grammar with productions that get modified as part of the parse. This solved the somewhat tricky issue of parsing node field values. But then I got side-tracked with getting the stand-alone viewer (openvrml-player) working reliably; that took much, much longer than I’d anticipated. But now I’ve picked up pretty much where I left off. As of this writing, I can parse nodes, EXTERNPROTOs, and PROTOs, except for IS mappings. I still have to do ROUTEs; though they will be pretty easy now that I’ve got DEF working.

As with the ANTLR-based parser, I’m doing a good deal of semantic checking; this parser will be just as aggressive about rejecting code that’s Not Quite Right as OpenVRML’s current parser. But unlike OpenVRML’s current parser, I’m using very little of OpenVRML’s runtime machinery to accomplish this checking. The idea is to make this parser much more reusable than OpenVRML’s current parser. The current parser isn’t really exposed; users can read the file into the runtime and then inspect the node tree created for the runtime. It turns out, though, that a good deal of OpenVRML’s users (and prospective users) don’t care one bit about a VRML/X3D runtime—they just want to read a VRML or X3D file and do something with the data. So, the new parser will have

Pluggable semantic actions
Minimal dependencies on the rest of libopenvrml—ideally, linking with libopenvrml won’t be required at all for someone just using the parser

This will all be possible through the magic of Spirit and C++ templates.

Syndicated 2007-06-18 05:21:22 (Updated 2007-07-02 18:43:10) from endoframe :: log

18 Jun 2007 braden » (Journeyer)