8 Feb 2002 chromatic   » (Master)

Scary Perl Refactoring:

I first formally encountered the idea of refactoring while studying XP. I'd been doing it already (especially when giving advice to new programmers), but didn't have a vocabulary for it. As my study progressed, I learned that Smalltalk (among other languages) has a Refactoring Browser -- since refactoring can be considered mechanical transformations, why wouldn't a machine be able to do them?

At Schwern's Refactoring talk at TPC 5.0, he demonstrated the beginnings of a Perl parser that warned about dubious constructs. Parsing Perl with anything but perl obviously isn't a simple task -- Damian hasn't released Parse::Perl, though I'm impressed with perltidy.

Looking at Smalltalk in more detail made me think that operating on source code is the hard way. Working on bytecode would be much easier -- except for associating lines of code with opcodes. (I worked up a source filter that would insert a target with comments and code on appropriate occasions. This data can then be extracted as necessary.) Talking to Ned Konz and Simon Cozens, they both thought that bytecode was the right track.

Another piece of the puzzle came in writing an article about the Linux Kernel Janitors. The idea behind the Stanford Checker really stuck out. If you can demonstrate an error pattern, the compiler can look for it in the code. Obviously, I can take a bit of bad Perl, compile it to bytecode, and have a tree that marks a bad pattern. As Simon pointed out, though, searching a tree for a tree is a difficult problem.

I didn't entirely agree. Though it's usually stupid to disagree with someone that smart, sometimes it can lead to a good idea. For some reason, I thought it was doable. Walking near a koi pond one afternoon, it hit me.

The XML guys have, more or less, solved this.

Okay, the LISP guys may be able make a better case, but the important thing is that it's solvable. I thought about sending Matt Sergeant an e-mail, asking him how to take some of the rules of XPath and apply them to a non-XML tree. For a few months, the whole idea was on the back burner.

Nearly all of the pieces are in place already. There's a bytecode decompiler in B::Terse (and I knew a bit about it, having written tests for it). There's a bytecode generator in B::Generate (thanks to Simon). There's a bytecode to Perl converter in B::Deparse (thanks to a lot of people, especially Rafael Garcia-Suarez, lately). I just needed to find some way to apply something like XSLT to Perl bytecode.

This morning, it hit me. Maybe it was the Perl XML fans talking about SAX being important for more than XML, but I realized that if I could write a backend module to turn bytecode into XML, the tree matching and conversions would be solved. The only tricky part that's left is generating XSLT or XPathScript or whatever syntax to refactor an error pattern. The same XML guys who provided the final nudge can probably help out in that respect.

So now I have B::ToXML that can XMLize a code reference, and it works pretty well. If I knew the internals better, I'd be able to tell what kind of information is important. I don't yet have any way to say "go from this to this, keeping this but changing this", but I'm one step closer.

I could still be on the wrong track, but I really think I'm on to something here. Drop me a line if you have a strong opinion either way.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!