Compile Nightly, Run Faster

Posted 5 Nov 2002 at 10:41 UTC by tnt Share This

Compiled programs have a speed advantage over interpreted programs. Even when the would-be interpreted program is JIT'ed, real compilation still has the advantage.

So why not just have your computer compile all the scripts and VM code on your computer, while it is not doing anything else.

Although I had been a Linux user for a number of years. And had been at home at the shell for even longer (on IRIX and Solaris). I'd never seen the locate command until recently. (I saw someone else using it at a VanLUG meeting.)

The locate command let's you perform extremely fast searches of you file system. It is able to perform these searches at such a fast rate, by what some might call "cheating". In the middle of the night, a special program will run, called updatedb. updatedb goes an indexes your entire file system. This index is what locate uses when it performs searches. It doesn't actually go and search your file system; but instead searches the index that it made the night before.

This same idea could be applied to speeding up scripts and VM (Virtual Machine) code. In the middle of the night, while your computer is not doing anything else, a special program could run and compile all your scripts and VM code into native binaries.

Having said that, there are some details to work out.

The first detail is, where do we put the compiled version of these scripts and VM code. (Surely we don't want to simply replace the original versions.) Well, one way we could do it is by having a special hidden directory created. Maybe have the directory called ".cat". ("CAT" would be an initialization for Compiled Ahead of Time.) And then put the compiled Linux ELF versions of the scripts and VM code into that directory. (This could be done in a similar manner to how thumbnail images are put in the ".thumbnails" directory, according to the Thumbnail Managing Standard.)

Then, whenever we want to execute a program, the operation system (or whatever) would the check to see if there was a ".cat" directory and a compiled version or the script or VM code, to run; and only if there weren't would it go and run the original (non-compiled) version.

Also, we have to consider what happens if we update a program. Well, then the compiled version of the script or VM code is for the old version of the program. Well, we could solve this by making the operating system (or whatever) check the dates on the compiled version and the original. If the compiled version is newer, then run that; else run the original.

(There are likely more details to work out too.)

So what does this mean then. It means that programs written in Perl, Python, shell script, Haskell, etc, or which have been compiled to the Java JVM, .NET/Mono CLR, or other VMs, run much much faster.

Except for one problem., posted 5 Nov 2002 at 15:16 UTC by Pizza » (Master)

Interpreted languages give you a considerable amount more flexibility than compiled languages; including the ability to dynamically generate and evaluate code on-the-fly. At the very least, your "script compiler" would have to be smart enough to not compile those scripts. or if you wanted to get fancy, you'd have to figure out which bits can't be compiled and run 'em in interpreted mode. And that's what makes JIT compilers such complicated beasts. They're very tightly coupled to the underlying language/VM specs. Then toss in things like class libraries...

Geez, it sounds so simple..., posted 5 Nov 2002 at 22:49 UTC by rasmus » (Master)

Do you know how hard it is to compile a loosely typed scripting language? All the typing happens at run-time depending on the data encountered. So you can't just compile it in the traditional sense, at best you could generate some sort of execution framework within which the code would run in some pseudo-accelerated state. You could call it compiled, I suppose, but it wouldn't really be compiled in the same sense that you compile C or C++.

And yes, as Pizza mentions, dynamic code poses a problem as does a number of other nifty conveniences that the various scripting languages give you.

pseudo-accelerated state?, posted 6 Nov 2002 at 02:19 UTC by dan » (Master)

Lisp has had dynamic typing (types are associated with values rather than variables) since its inception (sometime in the 1950s). It's had native code compilers since the 1970s. Any worthwhile Common Lisp implementation compiles these days: some don't even include an interpreter at all

Can you elaborate on the problem you're talking about? Compiling shell or awk scripts would, I agree, continue to be a bitch, but my impression is that that has less to do with dynamic typing and more to do with (1) a generally rather silly evaluation-by-repeated-substitution model, and (2) most of the functionality is provided by forking external programs anyway, so compiling is really not going to win you much.

Replies, posted 6 Nov 2002 at 08:14 UTC by tnt » (Master)

This is a reply to Pizza's post titled "Except for one problem"; rasmus's post titled "Geez, it sounds so simple...,"; and dan's post titled "pseudo-accelerated state?,".

Let me apolgize for the confusion. The article was not about how to compile scripting languages or VM byte-code. But was about having it done automatically by the computer (if the compiling technology already existed).

It was about having Java byte-code programs, .NET/mono CLR programs, and scripting language programs run faster (if the technology already existed to compile the VM's byte-code or the scripting language) without the user having to do anything.

So, for example, this is what I envisioned. (I'll use a Java byte code example.)

I just downloaded and installed a new e-mail program which was written in jython. (And thus, it is in a binary Java VM byte code format.) Now I am lucky enough that my system doesn't interpret the byte-code, but JIT's them. And the JIT'ing makes the e-mail program run at an OK speed, but it is still somewhat slugish.
Now imagine that I, the user, am computer user of normal-computer-knowledge. In other words, I don't know what a compiler is. And even if I've see the command line before, I would never use it.
I surely wouldn't use any Ahead-Of-Time compiling technology (like that found in the GCC) to compile my e-mail program, to make it run faster. (Because I wouldn't even have that kind of knowledge, as a normal-computer-user. Not to mention that the command line would intimidate me.)
(This is where the automatic compiling system would come into play.) Now, my computer, being a sophisticated piece of machinery, would take care of all this stuff for me. Once a night, it would go through all the files on my computer. And eventually, it would find my new e-mail program. It would notice that my new e-mail program's binary was a Java VM byte-code binary. It would know that it could use GCC to compile this into native (Linux ELF, or whatever) code. It would do this. And the next time I ran my program it would magically be faster.
(The article talked about some other stuff. Like NOT overwritting the original script or VM byte-code binary with the native code. And also talked about a convention that could be used for where to store the compiled version.)

I hope that clears things up a bit.

Emit relocatable code in code cache, posted 6 Nov 2002 at 09:20 UTC by Dries » (Master)

There are compilers/VMs available that do remote compilation or proxy compilation.

You don't have to statically compile your Java applications overnight though ... Why not make your runtime compiler (JIT compiler) emit relocatable code (see section 7.3) that can be loaded from a code cache on disk next time you start your Java application? After all, runtime compilers can use run-time profile information to aggressively optimize in ways a static compiler can not.

Re: Pizza, posted 6 Nov 2002 at 13:35 UTC by mdanish » (Journeyer)

dan already covered this a little, but: any decent Lisp environment lets you generate and compile code at runtime just as smoothly as any interpreter would, as well as interact with the compiled code. There is nothing intrinsic about an interpreter that makes this more flexible; it just makes it easier to implement. Quite frankly, the various "scripting" languages such as Python and Ruby have no excuse to be interpreted. But they're too busy reinventing the wheel, to provide decent compilers. Perhaps if all these people had bothered to work on Python instead, there wouldn't be all these complaints about the speed of the various scripted programs.

IL Code, posted 6 Nov 2002 at 20:49 UTC by nymia » (Master)

The article is basically asking most non-IL emitting code have this feature. Meaning, support IL code.

I don't know if that is possible, though. Since a lot of interpreters don't even think of emitting IL code. Take for example, Bash, which is basically a simple interpreter. Making it emit IL code would probably not make sense since it will probably emit C code anyway.

But that is only an exception, I'm sure there are languages out there that can really deliver IL code and execute them in a flash.

no, heres why, posted 6 Nov 2002 at 22:58 UTC by splork » (Master)

first, there should be no such thing as "idle" time on a computer. when it isn't busy using all of its CPU power it should be busy saving power (ie: consuming many less watts from your electric bill, eating less battery, polluting less air/water, etc).

second, Dries has it right. If something can benefit from optimization (JIT compilation, etc) it should just cache "compilations" at run time for use later in the future. that way only things likely to be used are stored.

third, date comparisons are not reliable Use a comparison of the secure hash of the source code to indicate what "compiled" binaries to load for it. When executing, look in the cache for things matching that secure hash. (One example of this in use is the Inline::C perl module that lets you embed C code directly in perl scripts. It caches compiled inline code in this manner)

mdanish optimized compiled Python would be nice if it could actually provide any speedup. I'd love to see it attempted. Given how dynamic the language is I don't know if it would actually make much difference. (i'm ignoring the existing python and perl "compilers" that take the byte code and turn it into C code that removes the main interpreter loop; those don't do much for speed)

&quointerpreter loop&quo, posted 7 Nov 2002 at 14:03 UTC by crhodes » (Master)

splork: When you say "interpreter loop" you demonstrate that you have slightly missed the point. Why should the loop, which is a nice feature to have for development, be implemented as an interpreter? You are interacting with the development environment (in whichever language you are using); nothing requires this to be implemented as an interpreter.

You might ask how else it could be done. Well, it could be done by compiling the entered code, and then running the compiled code. This isn't always a winning strategy, but it can be. Objections to this on the basis of dynamically typed languages have missed the point; all that this would entail is that the types would need to be checked at runtime, too. It isn't necessary, when compiling to machine code, to compile via something that looks like simple C code — your language environment will already have type tags and the like, so why should the compiled code not use those?

And, as dan and mdanish have said, this isn't theoretical speculation; optimizing Common Lisp environments have been doing this since the 1970s.

Psyco, posted 8 Nov 2002 at 12:11 UTC by jamesh » (Master)

splork: if you are interested in speeding up Python execution, you might want to look at Psyco, which is a specialising compiler for Python. It provides a good speedup with minimal modification to your programs.

Re: splork, posted 9 Nov 2002 at 01:55 UTC by mdanish » (Journeyer)

Please follow the link I posted in my previous response. Hopefully you'll be much less confused afterwards, or at least more enlightened.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page