I first formally encountered the idea of refactoring while
studying XP.
I'd been doing it already (especially when giving advice to
new programmers), but didn't have a vocabulary for it. As
my study progressed, I learned that Smalltalk (among other
languages) has a Refactoring Browser -- since refactoring
can be considered mechanical transformations, why wouldn't a
machine be able to do them?
At Schwern's
Refactoring talk at TPC 5.0, he demonstrated the beginnings
of a Perl parser that warned about dubious constructs.
Parsing Perl with anything but perl obviously isn't a simple
task -- Damian hasn't released Parse::Perl, though I'm
impressed with perltidy.
Looking at Smalltalk in more detail made me think that
operating on source code is the hard way. Working on
bytecode would be much easier -- except for associating
lines of code with opcodes. (I worked up a source filter
that would insert a target with comments and code on
appropriate occasions. This data can then be extracted as
necessary.) Talking to Ned
Konz and Simon
Cozens, they both thought that bytecode was the right track.
Another piece of the puzzle came in writing an article
about the Linux Kernel Janitors. The idea behind the Stanford Checker
really stuck out. If you can demonstrate an error pattern,
the compiler can look for it in the code. Obviously, I can
take a bit of bad Perl, compile it to bytecode, and have a
tree that marks a bad pattern. As Simon pointed out,
though, searching a tree for a tree is a difficult problem.
I didn't entirely agree. Though it's usually stupid to
disagree with someone that smart, sometimes it can lead to a
good idea. For some reason, I thought it was doable.
Walking near a koi pond one afternoon, it hit me.
The XML guys have, more or less, solved this.
Okay, the LISP guys may be able make a better case, but the
important thing is that it's solvable. I thought about
sending Matt Sergeant an
e-mail, asking him how to take some of the rules of XPath
and apply them to a non-XML tree. For a few months, the
whole idea was on the back burner.
Nearly all of the pieces are in place already. There's a
bytecode decompiler in B::Terse (and I knew a bit about it,
having written tests for it). There's a bytecode generator
in B::Generate (thanks to Simon). There's a bytecode to
Perl converter in B::Deparse (thanks to a lot of people,
especially Rafael Garcia-Suarez, lately). I just needed to
find some way to apply something like XSLT to Perl bytecode.
This morning, it hit me. Maybe it was the Perl XML fans
talking about SAX being important for more than XML, but I
realized that if I could write a backend module to turn
bytecode into XML, the tree matching and conversions would
be solved. The only tricky part that's left is generating
XSLT or XPathScript or whatever syntax to refactor an error
pattern. The same XML guys who provided the final nudge can
probably help out in that respect.
So now I have B::ToXML that can XMLize a code reference, and
it works pretty well. If I knew the internals better, I'd
be able to tell what kind of information is important. I
don't yet have any way to say "go from this to this, keeping
this but changing this", but I'm one step closer.
I could still be on the wrong track, but I really think I'm
on to something here. Drop me a line if you have a
strong opinion either way.