4 Nov 2000 japhy   » (Journeyer)

Whoo hoo. People know me here. That's nice. :).

Rather than post an article, I'm going to diarize my diatribe. I'm working on a paper for the next Perl conference in California, TPC 5.0. It's basically on a whole new regular expression paradigm, that would theoretically work for any dialect of regexes. The concept is simple. Matching variable-width things at the END of a string is not efficient -- reverse the string and the idea of the regex, and you have a much faster process.

The simplest example is matching the last sequence of digits in a string. With a regex like (\d+)\D*$, the engine will find EACH occurrence of \d+, and fail for EACH occurrence but the last one. This is a lot of failure for a long string. So, reverse the input string. Reverse the sense of the regex to ^\D*(\d+). Then, reverse the sequence that matches.

I've developed a module for reversing many simple regular expressions, in Perl. What would be nice is an optimizer, since in ^\D*(\d+), the ^ and \D* are totally extraneous.

Anyway, it's cool, fun, efficient, and writing the parser helped me understand how less sleep makes regular expressions make more sense. ;)

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!