Older blog entries for emk (starting at number 21)

I'm banging away on a native code generator. Basically, we needed a bytecode interpreter, and I had written one too many bytecode VMs. Remembering a comment by David Simmons on his SmallScript work--he argued that JITs perform 10 times better than VMs for about the same amount of work--I decided to hack up a native code generator instead.

Sure enough, it's easy, at least once you figure out the platform's calling conventions and what a ModR/M byte actually is.

Now, generating good code is an entirely different matter. But I'm not trying to do that. :-)

Python Iterator Bug. Python 2.2 supports CLU-style iterators (they call them generators). They're essentially a kind of co-routine.

But there's a subtle bug in the Python design. Consider tree traversal:

def inorder(t):
    if (t.left != None):
        for (node in inorder(t.left)):
            yield node
    yield t
    if (t.right != None):
        for (node in inorder(t.right)):
            yield node
If you study this carefully, you'll see that (unless the optimizer intervenes), Python has turned a perfectly good O(N) tree traversal into an O(N log N) traversal.

How to Fix It. I'm probably going to write a paper with the details, and try to get it published, but here's the crux of the matter.

Python insists on performing the two subiterations manually, and on returning their output unchanged. But if you stack enough of these iterations on top of each other, you can kiss your performance goodbye (both asymptotically and cycle-wise).

Instead, you need to transfer control to your subiterations:

def inorder(t):
    if (t.left != None):
        yield_all inorder(t.left)
    yield t
    if (t.right != None):
        yield_all inorder(t.right):
Now the compiler has a fighting change to optimize a deep stack of iterators into reasonable code.

It turns out that you can formulate such concepts as self-recursive iterators, tail-called iterators, etc., and apply optimization techniques analogous to those used by LISP compilers to optimize function calls. What's worse, you can actually implement all of this using a portable C back-end to your compiler. :-)

26 Aug 2001 (updated 27 Aug 2001 at 14:08 UTC) »
My Favorite Toy Language. Oh, dear. My design efforts have gotten too far ahead of my coding efforts, opening me to charges of bogosity. :-/ An overview of what I'm up to...

Soft Typing. MFTL is softly typed. This means that (1) everything, including primitive types, is a subclass of Object, (2) type declarations all default to Object and (3) downcasting is automatic. For example, the following program is legal:

def fact_untyped (i)
  if (i <= 1)
    i * fact_untyped(i-1)

But if the programmer types the following, they get much better performance and compile-time warnings about type violations:

def fact_untyped (i: int): int
  if (i <= 1)
    i * fact_untyped(i-1)
Everything's An Expression. In the above example, 'if' is an expression, like the ternary operator in C. So you could write:
var x = if (y) 1 else 2 end

Newlines End Statements. Just like in sh, bash, JavaScript, and Ruby, newlines in MFTL are statement terminators. (This is slightly funky, but it allows 'if' to be an expression without massively cruftifying the semicolon-placement rules.)

Keyword-Based Initialization. Classes generally don't need constructors:

class Sample ()
  var x, key: x
  var y = 10
s = new(Sample, x: 5)

Generic Functions. You can do the equivalent of operator-overloading at run time, not just compile time.

abstract class Thing () end
class Rock (Thing) end
class Paper (Thing) end
class Scissors (Thing) end
def defeats? (a: Thing, b: Thing) false end
def defeats? (a: Paper, b: Rock) true end
def defeats? (a: Rock, b: Scissors) true end
def defeats? (a: Scissors, b: Paper) true end
defeats?(new(Paper), new(Scissors)) // false
defeats?(new(Rock), new(Scissors))  // true

Yes, we know how to make this go fast.

Getters and Setters. No more endless 'getFoo' and 'setFoo' functions! Just write:

public class Example ()
  public var foo, key: foo

If you later decide that you need a getter and setter function, just write:

public class Example ()
  var real_foo, key: foo
  public def get foo ()
  public def set foo (value)
    real_foo = value
The users of the class will never know the difference.

Other Stuff. Things I want, but which I know will require extra work to do right: inlineable iterators, design by contract, integrated unit tests, simple templates.

Performance. With full type declarations, it should be possile to compile MFTL to run at speeds approaching that of C. Softly-typed languages are a solved problem.

Implementation Status. The parser and the VM are about 50% complete. The compiler was about 2% complete before I threw it out and started over. :-( If I had the luxury of working on this project full-time, I could ship a demo interpreter in about three months, and an MFTL-to-C compiler not long after.

Feedback. Comments, suggestions to eric.kidd@pobox.com.

Certification. I'm curious about the ongoing saga with aaronsw's certification. He's proving suprisingly hard to suck into the local web of trust.

So... I read up on his projects, looked as his work, reviewed the certification guidelines, and tried to give him an appropriate certification (based on what I could quickly learn).

Still no change. Very weird.

I'm quite bewildered by this trust metric. One very generous (but, IMHO, undeserved) master-level certification was enough to drag me up from observer to master. But the combined efforts of quite a few people aren't helping aaron at all.

MFTL. Wow! Between my short-term job at MIT (helping implement a funcional programming language) and my hobby (designing a functional/imperative/OO language), I'm beginning to find a whole bunch of good answers to design questions. It's nice to work with people who are much smarter than you are. ;-)

Poll: Do I dare to open up port 80 on my cable-modem system, and let people access my language design Zwiki? Or is that just begging for script kiddies to open fire at Zope and various Apache modules?

XML-RPC: I finished merging the Windows patches from several contributors; these will all go out in the next release.

I've also started work on xml-rpc-api2cpp, which is essentially an IDL compiler for XML-RPC. Instead of using IDL files, however, it directly queries the remote server for API information.

My XML-RPC hacking time is down to a few hours here and there. This will change in six or seven weeks. Yay!

XML-RPC Acceleration: Spent this morning hacking mod_gzip to to understand "deflate" compression. If you drop this hack into your webserver (and use a smart XML-RPC client), you'll probably cut your outbound XML-RPC network traffic by a factor of 10.

It's very experimental--I can make it dump core--but it's the start of something moderately nifty. Combine this with boxcarring, and you're beginning to get some decent scalabity and throughput.

Binary Data: I also spoke with several of Flight Gear developers (all very cool folks), and discussed the possibility of XML-RPC without any XML. Basically, a client and server could negotiate away the XML layer, and transmit raw binary data structures to each other. You'd keep all the fun features of XML-RPC (the introspection, the dynamic data, the 750-line clients), but get enterprise-grade RPC when you really needed it.

The hard part of this would be the design, not the implementation. How do you hide the funky new features from the less-advanced clients, and how do you activate the new protocol?

Introspection: I wrote a script called xml-rpc-api2txt. Give it the URL of a server, and it will print out a nicely-formatted interface specification, complete with documentation. I fixed it to play nicely with Meerkat, too--O'Reilly was preformatting some of their documentation strings, which was messing up Perl's formatting commands.

Now, who wants to hack this script to automatically generate C++ and Java classes for a given server? :-)

Community: Yikes! Things are really starting to move. I got piles of e-mail today, half of which contained patches and the other half of which contained great ideas.

SourceForge: Is full of bugs.

XML-RPC Problems: Spoke with Adrian about RedHat's experience with XML-RPC. It seems that they've run into a very big problem: round-trip HTTP message latency. They tried to make lots of little XML-RPC calls, and their performance died miserably.

I did some research on HTTP pipelining (which xmlrpc-c already supports), but this doesn't fix the problem. It seems that HTTP always requires a minimum of one round-trip packet per request. So if you've got a 250ms ping time, you can't make more than two XML-RPC method calls per second.

This is totally unacceptable. Oh, sure, you can work around it by reducing your number of function calls to a minimum (which is what the RedHat Network does), but your APIs will still be bletcherous.

A Proposed Fix: Let's say you're invoking a simple-minded addition function:

>>> import xmlrpclib
>>> multi = xmlrpclib.Server("http://localhost/cgi-bin/multi-cgi.cgi")
>>> multi.sample.add(1, 1)

Now, if you need to perform a zillion additions, this is really going to suck. But since XML-RPC is so dynamic, there's no reason why you can't just do something like this:

>>> add_2_2 = {'methodName': 'sample.add', 'params': [2, 2]}
>>> add_4_4 = {'methodName': 'sample.add', 'params': [4, 4]}
>>> multi.system.multicall([add_2_2, add_4_4])
[[4], [8]]

On the back end, my new server code unpacks this transparently, calls all the appropriate handlers, and packages up the result.

If you're a serious RPC hacker, I'd really appreciate some feedback.

On related note, Ryan Dietrich is hacking on an asynchronous XML-RPC message queue. Way cool!

XML-RPC: The XML-RPC HOWTO has now become an official part of the Linux Documentation Project! You can find it in the usual place. The PDF and PostScript versions are still broken. This is being investigated.

(You can tell from my bubbly enthusiasm that this is my first submission to the LDP.)

I want to follow RedHat's lead on XML-RPC/SSL, but I need some good crypto advice.

XML-RPC: Hack, hack, hack.

I've added support for HTTP Basic Authentication to xmlrpc-c. This required refactoring the client library to use server objects (instead of just URLs). This enhancement turned into a nightmare debugging job, because the following software sucks more than strictly necessary:

  • PHP: Every time I write a non-trivial PHP program, I spend 15 minutes wondering why some variable is unset when it should contain data. Then I realize that I've left out a global declaration, and PHP has silently created an empty local variable shadowing some global.

  • w3c-libwww: Libwww has this weird, asynchronous callback model, and the API isn't especially well documented. Once you define a callback, libwww feels entitled to call it at the strangest of times--during some failed network operations (but not others), during library shutdown, and whenever you look at it cross-eyed. And sometimes libwww will pass you a "200 OK" when it hasn't even made a network connection, and your chunk pointer will mysteriously be NULL. So my code is now laced with incredibly paranoid assertions, and basically trusts nothing returned by the library.

And let's not even ask which festering ball of bugs was truncating five bytes off the bottom of every XML document. But at least everything now works, and passes my evil test suites once again.

Jade: While I'm ranting about broken tools, I should mention Jade (and JadeTex, and TeX, and all the other hairy DocBook tools). This stuff is impossible to set up. It's badly broken in RedHat 6.2. It requires you to edit TeX *.ini files.

If you're not careful, you'll start mumbling the following nonsense in your sleep:

pdftex -ini -progname=pdfjadetex pdflatex.ini
pdftex -ini -progname=pdfjadetex \&pdflatex pdfjadetex.ini
# restore the original pdflatex.fmt file:
pdftex -ini -progname=pdflatex pdflatex.ini

(Thanks to Adam Di Carlo for figuring this out.)

It's just unbelievable how bad this stuff is. TeX allocates everything in fixed size buffers, and hard-codes the buffer sizes into all of its format files. Jade spews out ASCII NULLs, causing JadeTex to cough up a lung. TeX fmtutil wants to have a little talk with you, and find out where JadeTeX hid all the *.sty files. And of course, let's not forget the joy inherent in editing SGML CATALOG files to work around RedHat typos.

But the resulting manuals are fairly pretty... :-)

XML-RPC: Zope exports a complete scripting API using XML-RPC. But to use it, your xmlrpc client library needs to support HTTP Basic Authentication (or maybe cookies, depending on how the server is configured).

xmlrpc-c doesn't support any of these things, yet. But let's see if I can figure out how to do them with w3c-libwww...

12 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!