Older blog entries for michi (starting at number 6)

WAQL-PP 0.1 released

With this post I am proud to announce the first release of WAQL-PP, a WAQL Preprocessor for Java I was working on for the last two weeks. In one of the former posts I described the motivation behind this little project and how I planned to implement it. I’m rather satisfied with the result, so without further ado comes a copy of the release notes for this version. If you are interested just visit the project page to check it out.

WAQL-PP 0.1 released.
 
This is the first release of the WAQL Preprocessor for Java. Here is a short
list of the most important features:
 
  * Resolves Data Dependencies between separate queries by converting
    replacement objects into a textual representation.
  * Handles nested Data Dependencies from innermost to outermost.
  * Transforms Template List constructs into valid XQuery for-clauses and
    handles correlations between different Template Lists.
  * Parser tested against the XML Query Test Suite (XQTS).
 
This release was developed against and tested with Java SE 1.6.0_22. It uses
Apache Ant as a build tool, JUnit 4.8.2 for testing purposes, JavaCC 5.0 as a
parser generator and has no additional runtime dependencies. It is currently
being used as a component in the WS-Aggregation framework.
 
Information about the project and general documentation can be found on
http://www.antforge.org/waqlpp
 
The WAQL-PP 0.1 release packages can be downloaded from
http://www.antforge.org/waqlpp/download/waqlpp-0.1/
 
File   : waqlpp-0.1-src.zip
md5sum : 57d06bfedaf1abd6eeed793838d96fc7
sha1sum: 1a5fd2196a0916fd74479c4e7aaa57811b673e3b
 
File   : waqlpp-0.1.jar
md5sum : bf97850f878014090eb9b9849e18ab37
sha1sum: 74c4e0e7e78bc16fea4bfd1b0954439e74636118
 
Enjoy!
Michael Starzinger

Syndicated 2010-11-11 22:07:58 from michi's blog

WAQL-PP: Preprocessor for a Data Aggregation Query Language

This week I started to design and implement a preprocessor for the Web-service Aggregation Query Language (WAQL) which is an extension of XQuery. This language is used as part of the WS-Aggregation framework developed at the Distributed Systems Group of the Vienna University of Technology. With this text I want to explain the motivation behind WAQL and how the preprocessor will be designed. The motivation is nicely stated as part of my task description.

The key idea of WAQL is that it provides a convenience syntax for XQuery, which otherwise tends to become complex and hardly comprehensible in bigger scenarios. WAQL queries are transformed into valid XQuery expressions, which are finally executed by a (third-party) XQuery engine.

First of all we need to get a grasp of what the WAQL extensions to the XQuery language are. Since WAQL is still in its experimental stages, there is no exact specification of the language and it may change or grow over time. At the moment WAQL consists of two language constructs:

  • Template Lists: This extension tries to simplify the specification of generated inputs. It basically is syntactical sugar representing a XQuery for-loop construct and as such can be transformed easily.
  • Data Dependencies: This second extension is the interesting one, it can express dependencies between several different queries. The framework has to identify these dependencies and execute the queries in a valid order, so that all dependencies can be resolved.

The above two constructs should explain why the actual transformation has to be split into several phases which can be triggered by the framework at different points in time. The separate steps the preprocessor has to perform are as follows:

  1. Parsing: The textual WAQL query is parsed and an intermediate representation is constructed. Since WAQL is an extension which enhances the set of expressions for the XQuery language, the actual parser has to understand the full XQuery grammar. This may sound like a lot of work, but the XQuery specification provides a detailed description of the grammar in about 140 EBNF rules. So defining a valid parser is a doable job.
  2. Resolving of data dependencies: At this point the preprocessor has generated a list of all unresolved data dependencies. However the preprocessor has no idea which other queries are linked to the one currently being processed. So the actual resolving has to be done by the framework, the preprocessor just adapts the intermediate representation to the data provided by the framework.
  3. Transformation: Once all dependencies have been resolved the intermediate representation can be transformed back into a textual XQuery (without any WAQL extensions), which can then be passed on to a third-party XQuery engine.

Now that the basic operations are defined, we are able to give a rough description of the WAQL preprocessor and how it can be embedded into the existing framework. The two basic modules are a generated parser (obviously performing the parsing step) and a driving engine (performing the resolving and transformation steps). The parser will most certainly be generated using the JavaCC parser generator. The below graphic should explain the architecture.

Architecture of the WAQL preprocessor

Note that the above explanation is written from the compiler-constructor point of view, it just covers the preprocessor as part of the framework. All the other nasty details of WS-Aggregation are beyond the scope of this text. If you are interested you should read the paper or contact Waldemar Hummer who was kind enough to explain it to me. Also I will continue to write about the ongoing development of the preprocessor, so stay tuned.

Update: This text was crossposted to the DSG Praktika Blog as well.

Syndicated 2010-10-25 23:14:17 from michi's blog

Puzzling Java statement of the day

Some days ago I stumbled across a Java statement which I thought was trivial at first, only to discover that I had no idea. I have a reasonable understanding of what a JVM does and how Java bytecode is executed. But as this example shows once again, that doesn’t necessarily spread to the Java programming language. The snippet below should explain my point.

  int lorem = 1, ipsum = 2, dolor = 3;
if (lorem == (lorem = ipsum))
	f();
if ((ipsum = dolor) == ipsum)
	g();

Which of the above two methods f() and g() is actually invoked? Can you tell without compiling the code? Possible answers are:

  • None of the two methods get invoked, both call-sites are dead code.
  • Just f() is invoked and the call-site of g() is dead code.
  • Just g() is invoked and the call-site of f() is dead code.
  • Both methods are invoked, the conditions are pointless.

Ironically, I finally understood what was going on after looking at the generated bytecode (good old javap is your friend). I am not posting the disassembled code because that would spoil the fun. But once you look at it, the answer appears to be quite obvious.

Syndicated 2010-07-30 20:44:50 from michi's blog

Goodbye Cacao ...

As some of you might have heard (or deduced from my lack of activity), I have left Cacao. This is not a decision I took lightly, hence it took me some time to make it official by the means of this post.

I started my work on Cacao in 2005 with my first project being the ARM port of the code generator, which turned out to be my bachelor thesis and hooked me up with Cacao. I continued to actively contribute to the development of Cacao and tried to help push it towards being a real Java Virtual Machine. Since then a lot has changed in Cacao. I have learned a lot from all the contributors, former maintainers and other people I worked with and for that I am very grateful.

One of my most recent endeavors (also happening to be my diploma thesis) was to prepare Cacao to cope with exact garbage collection, a topic which was neglected for too long in Cacao. A project that big requires a lot of infrastructure. Once you want do the cool stuff you realize that more and more of those tiny bits and pieces are missing. Nevertheless you still want to do all the really cool stuff, to be able to compete with others out there. The essence of that is, that it’s just too much work for a single person to effectively push forward the development of a mature JVM like Cacao in time.

The future has a natural tendency to resist prediction, so I don’t even try to make any for Cacao. But what I can say is that there are two people taking over maintenance of the code, namely David Flamme and Stefan Ring, both being capable and motivated to do that.

However, I don’t want to leave Cacao without a vision. During my time at Theobroma Systems I learned a lot about microkernels on embedded systems and picked up some of the enthusiasm about them from my former colleagues there. In my opinion, portability is what microkernel-based operating systems are really missing and a JVM might provide. To be sufficiently efficient, that VM needs to run directly on top of the hypervisor instead of being throttled by several compatibility layers. At first it would only be some kind of Micro Edition (if at all), but with some effort Cacao might be able to pull this off.

As for myself, I am looking forward to whatever the future may hold for me, and will keep you posted …

Syndicated 2010-05-14 14:19:38 from michi's blog

Why I don't like Java RMI and how I use it anyways

The Java Remote Method Invocation API is a great thing to have because it is available in almost every J2SE runtime environment without adding further dependencies. However there are some implications when using RMI and I just cannot get my head around them:

  1. Interfaces used as remote interfaces need to extend java.rmi.Remote. Interfaces should be clean and not contain any clutter introduced by a certain technology. This is even more true with modern frameworks and things like dependency injection.
  2. Remote methods need to declare java.rmi.RemoteException in their throws clause. This is basically a continuation of the first point. This point holds, even if you ignore the rant about checked exceptions, which I don’t want to comment on right now.
  3. Remote objects need to be exported explicitly. Even though one explicitly declared which methods should be accessible from a remote site with the above two points, one still needs to explicitly export every single instance of an object implementing those methods.

Don’t get me wrong, all those implications have their right to exist because the decisions leading up to them were made for a reason. But in some circumstances those reasons don’t apply. It is just not the one-to-rule-them-all solution for remote method invocation in Java.

There are ways around those problems. One could for instance duplicate the existing interfaces to fit the needs of RMI. But frankly speaking, I just don’t want to do that myself.


That being said, lets see if the task of separating the transportation layer based on RMI from your precious interfaces can be automated in some way, so it doesn’t have to be done by hand. The following are the key points of the approach:

  • Interfaces can explicitly be marked as remote interfaces at runtime without the need for recompiling them. All methods exposed by such an interface can be invoked from a remote site. All parameters which are subclasses of such interfaces, will be passed by-reference and will not be serialized. This is just the same behavior as if the interface would extend java.rmi.Remote in the RMI world. The actual remote interfaces are generated on-demand at runtime.
  • Provide a proxy factory which supports the rapid development of a transportation layer based on RMI for given clean interfaces. The interface classes do not need to be cluttered with specifics of the transportation implementation.
  • A proxy in this context is a transparent object implementing both, the local and the generated remote interface. Both interfaces are usable:
    • Cast the proxy to java.rmi.Remote and use it with any naming or registry service available to the RMI world. Every proxy implicitly is a remote object without the need for explicitly exporting it.
    • Cast the proxy to your local interface and don’t bother whether it actually targets a local or a remote site.
  • The decision how an invocation actually is dispatched can be solely based on whether the target object of a proxy is a remote or a local one. This decision is hidden inside the transportation layer.

Available as a download attached to this post you’ll find a first reference implementation of such a proxy factory as described above. Note that it is just a sketch to illustrate my point and will probably contain major flaws. Also it brings a dependency on Javassist, which kind of contradicts the very first sentence of this post. However it is capable of distributing this tiny example across several sites without modifying the given interfaces, which also represents my only test-case:

public interface Client {
	public void callback(String message);
}

public interface Server {
	public void subscribe(Client subscriber);
	public void notify(String message);
}

public class ClientImpl implements Client {
	public void callback(String message) {
		// ... do some important job ...
	}
}

public class ServerImpl implements Server {
	public void subscribe(Client subscriber) {
		// ... remember "subscriber" in some fancy data structure ...
	}
	public void notify(String message) {
		// ... invoke all "subscribers" like they were local ...
	}
}

This is my attempt to show how I personally think that RMI should have been designed in the first place. Please feel free to comment, improve, ignore or flame.

Syndicated 2010-04-06 20:05:16 from michi's blog

Cacao supports JMX Remote Monitoring and Management

Since a few days Cacao successfully starts OpenJDKs JMX Monitoring and Management Agent if requested to do so. This allows you to remotely connect to Cacao with any JMX-compliant monitoring tool. One of the main responsibilities of this agent is to act as a server for MBeans (managed beans). The JRE provides some basic MBeans which allow out-of-the-box monitoring and management of VM internals. But applications can easily extend the functionality by providing custom MBeans. If you want to learn more about this topic, you should visit OpenJDKs JMX group.

One such JMX-compliant monitoring tool is JConsole which comes bundled with most J2SDK installations. Below you see a JConsole from Apples Java running on my MacOS X workstation connected to Cacao running on a remote Linux machine.

JConsole connected to Cacao

Note that there (still) are some restrictions on the current support:

  1. Some of the VM internal management functions are not yet fully implemented. Those functions are defined by HotSpots JMM interface (the thing called jmm.h). It will take some time and patience until all of them are implemented.
  2. Only OpenJDK provides a reference implementation of the JMX agent, so at the moment there is no support for GNU Classpath.
  3. The thing that baffled me most, was that the documentation stated that applications running on the same machine inside another HotSpot VM process can be monitored without starting the JMX agent. I found out that HotSpot creates a shared memory region to which you can attach another VM process. I don't like the idea of sharing memory across VM processes at all, so Cacao does not (and probably never will) support this feature. But I implemented the necessary stubs to avoid UnsatisfiedLinkageErrors and make everything run smoothly. So don't wonder if you can't see a list of locally running Cacao processes in JConsole. If you are interested, all the functionality to access this shared memory is hidden in sun.misc.Perf.

And finally, how do you make Cacao start the JMX agent? Try the snippet below. If you want to know more about those magic properties, try one of the thousand other articles out there dealing with this topic.

$ java -Dcom.sun.management.jmxremote \
        -Dcom.sun.management.jmxremote.port=9999 \
        -Dcom.sun.management.jmxremote.authenticate=false \
        -Dcom.sun.management.jmxremote.ssl=false

Have fun monitoring and managing!

Syndicated 2009-10-28 19:13:32 from michi's blog

First blog post

After several months of struggle and countless efforts trying to avoid it, I finally did it. This is my very first blog post. Furthermore it is a greeting to all the people out there who are interested in what I have to say.

Syndicated 2009-10-26 17:29:32 from michi's blog

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!