Advogato: Blog for emk

Python Iterator Bug. Python 2.2 supports CLU-style iterators (they call them generators). They're essentially a kind of co-routine.

But there's a subtle bug in the Python design. Consider tree traversal:

def inorder(t):
    if (t.left != None):
        for (node in inorder(t.left)):
            yield node
    yield t
    if (t.right != None):
        for (node in inorder(t.right)):
            yield node

If you study this carefully, you'll see that (unless the optimizer intervenes), Python has turned a perfectly good O(N) tree traversal into an O(N log N) traversal.

How to Fix It. I'm probably going to write a paper with the details, and try to get it published, but here's the crux of the matter.

Python insists on performing the two subiterations manually, and on returning their output unchanged. But if you stack enough of these iterations on top of each other, you can kiss your performance goodbye (both asymptotically and cycle-wise).

Instead, you need to transfer control to your subiterations:

def inorder(t):
    if (t.left != None):
        yield_all inorder(t.left)
    yield t
    if (t.right != None):
        yield_all inorder(t.right):

Now the compiler has a fighting change to optimize a deep stack of iterators into reasonable code.

It turns out that you can formulate such concepts as self-recursive iterators, tail-called iterators, etc., and apply optimization techniques analogous to those used by LISP compilers to optimize function calls. What's worse, you can actually implement all of this using a portable C back-end to your compiler. :-)

27 Aug 2001 emk » (Master)