Recent blog entries for gary

2 Sep 2008 »

Shark, now with 100% bytecode coverage

I did jsr, ret and multianewarray today; Shark now has 100% bytecode coverage.

Syndicated 2008-09-02 14:29:51 from gbenson.net

1 Sep 2008 »

DaCapo status

I’ve been working on DaCapo for nearly two weeks now, so I took a bit of time out today to figure out where I am with it:
  Status Detail Unimplemented bytecodes
antlr FAIL too many open files jsr (once)
bloat pass 718149ms multianewarray (once)
chart pass 337240ms
eclipse FAIL requires deoptimization
fop pass 37126ms
hsqldb pass 178120ms
jython FAIL requires deoptimization jsr (21 times)
luindex pass 149362ms jsr (3 times)
lusearch FAIL segfault
pmd pass 457936ms jsr (15 times)
xalan pass 174340ms jsr (8 times)

Note that this is a debug build, with no optimization and assertions enabled, so the times are in no way representative.

Syndicated 2008-09-01 14:52:12 from gbenson.net

29 Aug 2008 »

DaCapo

This past week or so I’ve been trying to get the DaCapo benchmarks running on Shark. It’s a total baptism of fire. ANTLR uses exceptions extensively, so I’ve had to implement exception handling. FOP is multithreaded, so I’ve had to implement slow-path monitor acquisition and release (all of synchronization is now done!) I’ve had to implement safepoints, unresolved field resolution, and unresolved method resolution for invokeinterface. I’ve had to replace the unentered block detection code to cope with the more complex flows introduced by exception handlers. I’ve fixed bugs in the divide-by-zero check, in aload, astore, checkcast and new, and to top it off I implemented lookupswitch for kicks. And I’m only halfway through the set of benchmarks…

Syndicated 2008-08-29 08:11:02 from gbenson.net

28 Aug 2008 »

Building Shark

For reference, this is how to reproduce my working environment and get a debuggable Shark built:

svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
cd llvm
./configure --with-pic --enable-pic
make
cd ..
hg clone http://icedtea.classpath.org/hg/icedtea6
cd icedtea6
curl http://gbenson.net/wp-content/uploads/2008/08/mixtec-hacks.patch | patch -p1
./autogen.sh
LLVM_CONFIG=$(dirname $PWD)/llvm/Debug/bin/llvm-config ./configure --enable-shark
make icedtea-against-ecj

After the initial make icedtea-against-ecj you can use make hotspot to rebuild only HotSpot</code>.

Syndicated 2008-08-28 07:58:48 from gbenson.net

20 Aug 2008 »

Shark 0.03 released

I just updated icedtea6 hg with the latest Shark. The main reason for this release is that Andrew Haley pointed out that the marked-method stuff I was using to differentiate compiled methods and interpreted methods didn’t work on amd64, and while it was possible to make it work there I didn’t like the idea of having something that needs tweaking for each new platform you build on. Now interpreted methods have the same calling convention as compiled ones, which makes the need for differentiation obsolete.

Other new features in this release include support for long, float, and double values, and a massive pile of new bytecodes. Check out the coverage page now, it’s awesome!

Syndicated 2008-08-20 09:03:53 from gbenson.net

15 Aug 2008 »

Debug option fun

I just extended the -XX:+SharkTraceInstalls debug option to print out a load more stuff, statistics on the code size and the number of non-volatile registers used and so on. If you run with it you’ll get something like this:

[0xd04bd010-0xd04bd1b4): java.lang.String::hashCode (420 bytes code, 32 bytes stack, 1 register)
[0xd04bd1c0-0xd04bd81c): java.lang.String::lastIndexOf (1628 bytes code, 80 bytes stack, 13 registers)
[0xd04bd820-0xd04bdc3c): java.lang.String::equals (1052 bytes code, 48 bytes stack, 5 registers)
[0xd04bdc40-0xd04be2f8): java.lang.String::indexOf (1720 bytes code, 80 bytes stack, 12 registers)
[0xd04be300-0xd04beaf4): java.io.UnixFileSystem::normalize (2036 bytes code, 80 bytes stack, 12 registers)
[0xd04beb00-0xd04c3310): sun.nio.cs.UTF_8$Encoder::encodeArrayLoop (18448 bytes code, 96 bytes stack, 15 registers)
[0xd04c3320-0xd04c348c): java.lang.String::charAt (364 bytes code, 32 bytes stack, 1 register)
[0xd04c3490-0xd04c3530): java.lang.Object::<init> (160 bytes code, 16 bytes stack, 0 registers)
…

This isn’t (just) because I like debug options. Lately I’ve thought of a couple of optimizations I could do, one to reduce the code size for methods with more than one return, and one to cut the number of registers used. The former probably won’t do a lot other than reducing compile time, but the latter should be well worth it. Maybe not so much on PowerPC — though I already have a couple of methods maxed out on registers — but not all platforms have the luxury of 19 caller-save registers! And, of course, if I’m going to spend time optimizing then I want to see it worked…

Whilst we’re on the subject of options I found another funky one: -XX:+VerifyBeforeGC. I’ve already fixed one bug using it!

Syndicated 2008-08-15 08:51:03 from gbenson.net

12 Aug 2008 »

Shark 0.02 released

I just updated icedtea6 hg with the latest Shark. To build it you need a fairly recent svn LLVM (I’m using 54012) and you need to configure IcedTea with --enable-shark.

Syndicated 2008-08-12 10:09:23 from gbenson.net

7 Aug 2008 »

Shark stuff

The new framewalker stuff is all done now. For interpreted frames you need to write all kinds of fiddly little accessors so the garbage collector can find the method pointer, the local variables, the monitors and the expression stack, and any other objects that may be lying around in there, but for compiled frames it’s simple: at any point at which the stack could be walked you just emit a map which says “in a stack frame with such and such a PC, slots 1, 2, 4, 5 and 8 contain pointers to objects”. The tricky bit is the PC: I don’t have access to the real one, so I had to fake one up, but it’s all working now — and surviving garbage collections — which is pretty cool! The garbage collector interface was the single biggest thing I was worried about, so it’s nice to have it under my belt, with all the old hacks removed.

Since finishing the framewalking stuff I’ve also implemented VM calls, which are the places where compiled code drops into C to do things too complicated to want to write in assembly. Making Shark fail gracefully when it hits unknown bytecodes was an amazing idea, as it shifted the focus from the simple grind of implementing bytecodes to the really critical — and interesting! — things. Doing it this way around means I can get all the infrastructure solid, then spend a week or so churning out the remaining ninety or so bytecodes.

Syndicated 2008-08-07 08:17:57 from gbenson.net

28 Jul 2008 »

New framewalker interface

I got recursive locks working on Friday, which got me back into the framewalker stuff. For HotSpot’s framewalker to see frames as native I need to supply it with something like a program counter can be used to reference into a set of tables that tell it, for example, which stack slots contain pointers for consideration by the garbage collector. It expects this to be in a block of generated code (which won’t really be code at all in Shark), but the core problem is that the “code” you generate goes into a temporary buffer which HotSpot then relocates into the final location so I can’t simply inline pointers from the buffer into Shark’s output. The final location of the “code” can not be determined at compile time, and even if it could it can move at any time as a result of garbage collector activity.

When you invoke a method in zero you start with a methodOop, a pointer to a structure containing (amongst other things) the method’s entry point. The entry point is simply a pointer to the function that you call to execute the method. The address of the final code buffer is also contained within the methodOop, but both the entry point and the code buffer are volatile — they can change at any time — so they need to be read at the same time, in one atomic operation.

What is needed is some way to pass a pointer to the code buffer when calling Shark methods. After a fairly intense thinking session it occurred to me that the entry point is going to be word-aligned, so the bottom two or three bits will always be zero. Code buffer pointers in HotSpot are always word aligned too, so I decided to use the bottom bit as a flag: if the bottom bit is clear then the entry point is a normal pointer-to-a-function entry point, but if it’s set then the “entry point” is really a pointer to the code buffer. The actual entry point can then be read from the code buffer, which in Shark does not contain code but simply whatever data I decide to put in there.

The nice thing about this is that, aside from adding only one or two instructions per method dispatch, it also opens up the possibility of method inlining, something I didn’t think would be possible.

Syndicated 2008-07-28 09:33:25 from gbenson.net

25 Jul 2008 »

Speed demon

I got whole method synchronization working in Shark yesterday, at least for simple cases (non-recursive, uncontended locks). It’s a small thing, but it means that everything that Shark needs in a stack frame is actually in the stack frame. When I get back onto the framewalker stuff I’ll be able to start adding whatever extra stuff I need without worrying about messing myself up for the future.

I’ve been using jar on the HotSpot sources as my testcase, and lately it’s been noticably faster. I did couple of quick timing runs yesterday and even with only 62 methods compiled it is already twice as fast with Shark as without. I’m on the verge of massive success…

Syndicated 2008-07-25 07:50:54 from gbenson.net

215 older entries...

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!