joolean is currently certified at Journeyer level.

Name: Julian Graham
Member since: 2004-12-07 17:48:29
Last Login: 2015-02-16 20:54:22

FOAF RDF Share This



Recent blog entries by joolean

Syndication: RSS 2.0
16 Feb 2015 (updated 16 Feb 2015 at 18:11 UTC) »
Transactional B-trees

The next version of gzochi is going to include a new storage engine implementation that keeps data entirely in memory while providing transactional guarantees around atomicity of operations and visibility of data. There are a couple of motivations for this feature. The primary reason is to prepare the architecture for multi-node operation, which, as per Blackman and Waldo's technical report on Project Darkstar, requires that the game server -- which becomes, in a distributed configuration, more like a task execution server -- maintain a transient cache of game data delivered from a remote, durable store of record. The other is to offer an easier mechanism for "quick-starting" a new gzochi installation, and to support users who, for political or operational reasons, prefer not to bundle or install any of the third-party databases that support the persistent storage engines.

That first motivation wouldn't bias me much in either direction on the build-vs-buy question; Project Darkstar's authors almost certainly (planned) to implement this using Berkeley DB's "private" mode (example here). However, gzochi is intentionally agnostic when it comes to storage technology. The database that underlies a storage engine implementation needs only to support serializably isolated cursors and reasonable guarantees around durability; requiring purely in-memory operation would be a heavy requirement. And I feel too ambivalent about Oracle to pin the architecture to what BDB supports, AGPL or no. (The Darkstar architects should have been a bit warier themselves.) So I settled on the "build" side of the balance. ...Although my first move was to look for some source code to steal. And I came up weirdly short. The following is a list of the interesting bits and dead ends I came across while searching for transaction B-tree implementations.

Some more specific requirements: There are two popular flavors of concurrency for the kind of data structure I wanted to build with the serializable level of transactional isolation I wanted to provide. Pessimistic locking requires that all access to the structural or data content of the tree by different agents (threads, transactions) be done under the protection of an explicit read or write lock, depending on the nature of the access. Optimistic locking often comes in the form of multi-version concurrency control, and offers each agent a mutable snapshot of the data over the lifetime of a transaction, mostly brokering resolutions to conflicts only at commit time. Each approach has its advantages: MVCC transactions never wait for locks, which usually makes them faster. Pessimistic locking implementations typically detect conflicting access patterns earlier than optimistic implementations, which wait until commit to do so. Because gzochi transactions are short and fine-grained, and the user experience is sensitive to latency, I believe that the time wasted by unnecessary execution of "doomed" transactional code is better avoided via pessimistic locking. (I could be wrong.)

Here's what I found:
  • Apache Mavibot - Transactional B-tree implementation in Java using MVCC. Supports persistent and in-memory storage. Hard for me to tell from reading their source code how their in-memory mode could possibly work for multi-operation transactions.
  • Masstree - Optimistically concurrent, in-memory non-transactional B+tree implementation designed to better exploit SMP hardware.
  • Silo - Optimistically concurrent, in-memory transactional store that uses Masstree as its B-tree implementation.
  • SQLite - Lightweight SQL implementation with in-memory options, with a transaction-capable B-tree as the underlying storage. Their source code is readable, but the model of threads, connections, transactions, and what they call "shared cache" is hard to puzzle out. The B-tree accumulates cruft without explicit vacuuming. The B-tree code is enormous.
  • eXtremeDB - Commercial in-memory database with lots of interesting properties (pessimistic and MVCC modes, claimed latencies extremely low) but, you know, no source code. So.
Because I was unable to find any pessimistic implementations with readily stealable source code, I struck out on my own. It took me about a week to build my own pessimistic B+tree, using Berkeley DB's code and architecture as a surprisingly helpful guide. My version is significantly slower than BDB (with persistence to disk enabled) but I'm going to tune it and tweak it and hopefully get it to holler if not scream.

gzochi 0.7 is out. Get it here.

This was a tough release to get out, in no small part because I'd decided to provide a suite of data manipulation tools to support the practical administration of a gzochid server. I wanted to make it easy to export and import data, for the purposes of backup and restore, and to perform large-scale schema transformations of serialized data, to support changes to data structures that occur as part of the evolution of a game application.

The first two were (relatively) easy. I looked for prior art and found some in the utilities that ship with Berkley DB, the most mature (and as of last year, the most appealingly licensed) of the storage engines the server supports, db_dump and db_load. It was pretty easy to make some higher-level ones that do the same thing (read and write database rows in a portable format) in terms of gzochid's abstract storage API: gzochi-dump and gzochi-load.

Migrating data is a different story. It's not enough to process the stream of rows as they're emitted from the object store. The objects that make up the state of a game form a graph, so you need to traverse that graph, marking nodes as they're visited. It also means that migrations must be done mostly in-place; fully transforming any single reachable node in the graph may require adding new nodes, removing existing ones, or changing the structure of other parts of the graph. Furthermore, the nature of the transformation to be done complicates the process: Each row of data is effectively a plain byte array, the only metadata being a prefix that provides a key into the application's registry of types; the rest of the data has no structure outside of what the application's serialization code applies to it. To transform the data but preserve its "type," two different serializers must be used -- one to read the original data in, the other to write the transformed data back out. This is indeed a problem RedDwarf Server faced because of its reliance on Java's built-in object serialization mechanism. Some rather pointed discussion of the problem can be found in this forum thread. Someone on that same thread mentions ICE and its sub-project, Freeze, which apparently solves a similar problem.

I did a little reading on these technologies, and while they didn't turn out to be -- nor did I expect them to be -- a fit for my needs, they got me thinking about how migrating a data set like this one is a complex enough operation that it might need some first-class modeling, beyond just defining the logic of the transformation. The solution I wound up with involves XML-based descriptions of input and output (read: deserialization and serialization) configurations and provisioning real storage for the state of the migration itself. Doesn't feel like too much; hope it's enough.

I've been working on game-related stuff, time permitting. I'm at a point where I can roughly synchronize the movement of a little naked guy walking around a green field (thanks, Liberated Pixel Cup!) between the server and connected clients, and I wanted to add some spatial occlusion to the mix: Areas of the map that both the client and the server understand to be blocked. I knew this wasn't a trivial problem to solve efficiently, so I started doing research on spatial indexing, and found out about...


An R-tree is a container structure for rectangles and associated user data. You search the tree by specifying a target rectangle and a visitor function that gets called for every rectangle in the tree that overlaps your target. Like all tree-based structure, the advantage you get when searching an R-tree derives from the use of branches to hierarchically partition the search space. R-trees use intermediate, covering rectangles to recursively group clusters of spatially-related rectangles. If your target rectangle overlaps a given covering rectangle, it may also overlap one of its covered leaf rectangles; if it doesn't overlap that rectangle, you can safely prune that branch from the search. The secret sauce of a particular R-tree implementation is in the rebalancing algorithm, which generates these covering nodes. A common approach seems to be to iteratively generate some number of covering rectangles that partition their underlying set of constituent rectangles as evenly as possible while minimizing the overlap of the covering set.

I whipped up a couple of implementations -- one in C with GLib dependencies, one in Scheme in terms of gzochi managed records -- based on my reading of the source code by Melinda Green, available here.


My own usage of this library uncovered another embarrassing issue: Deserializing a message with an embedded message field in r6rs-protobuf 0.6 doesn't work reliably, on account of the way the Protocol Buffers wire protocol directs the deserializer to handle what it perceives as unknown fields (throw 'em away). The solution is that you have to tell a delegate message deserializer exactly how much of a stream it's allowed to read, either explicitly (by passing the delimited length) or by preemptively consuming those bytes and wrapping an in-memory port around them -- which is what I did, to get a patch out as quickly as possible. Find version 0.7 here, if you need it.

14 Jan 2014 (updated 25 Jan 2014 at 13:18 UTC) »

Happy new year, everyone. I've just released gzochi version 0.5. Get it here!

As part of the fixes that went into this version, I made several adjustments to the error-handling behavior of the data storage layer, mostly to enable better communication about the result of a query to the database and the ensuing changes in transaction state. Prior to this, I'd taken the approach that any "failure" in data access -- such as a transaction deadlock -- could be indicated by the return value of the data access function, in part to smooth out differences in API between the various storage engines that gzochi supports. This had the fairly obvious disadvantage that it was impossible to tell the difference between a lookup for a non-existent key and an error (well, I figured, application code just shouldn't be doing that), and also that there was no way at all to indicate an error for functions that return void; and so after tracking down and fixing enough hard-to-fix bugs, I decided to fix the behavior of this API. In C, your options for returning multiple bits of information to a caller or distinguishing between, say, an "empty" response and an error are limited. You can make the type of data you return more complex by wrapping it inside of a data structure that also includes metadata about the invocation; or you can pass pointers to "outvalues" as arguments to your function, and have the callee modify them to indicate errors or other out-of-band responses. I like this latter approach because it allows you to preserve for the most part the intended interface between the function and its caller. You can, after all, allow people to pass NULL for those outvalues if they don't care about anything besides the return value. It does require, however, that your function reliably handle the intricacies of checking whether the outvalues are non-NULL and possibly allocating storage for them. GLib's GError type and its associated helper functions and macros are very convenient here. Pass a GError ** to your function and use g_set_error to set it conditionally.

The problem I was trying to solve with the foolishness above still exists, though: gzochi still supports several different storage engines (or, at least, it will -- support for GDBM was removed in this release; future versions will support HamsterDB) and each supported database has its own set of error codes and ways of returning them. So I created a kind of error code independence layer to convert implementation-specific values to values that are part of that layer. For example, in BerkeleyDB:


Back to the release: There's also a new "managed" data structure that employs the principles behind Project Darkstar's ScalableList and provides SRFI-44 collection semantics (one of the more contentious SRFI discussions I've read); and enhancements to the web monitoring console. Like I said, go get it!

Further adventures in game development: I've been working with Clutter to create a primitive client-side game engine to integrate with gzochi. Clutter's a 2-D scene graph library for writing OpenGL applications in C. It's the canvas library behind Mutter and, from what I can tell, a bunch of next-level GTK+ stuff. The documentation says it's for building "visually rich graphical user interfaces," but I don't see why you couldn't also use it to, say, manage the layout of a game screen, which is what I'm trying to use it for: There are a lot of things I don't miss from my brief career as a professional Flash developer, but computing dirty rectagles and figuring out the draw order for actors in a scene isn't something I was looking forward to writing myself. Finding Clutter felt serendipitous.

One thing Clutter doesn't come with, though, is a component that can render frame-based animations, which I'm modeling as sets of raster images that get painted on the stage in sequence at a specified rate. I've built some components that can load image atlases from disk or over the web and carve them up into frames in memory, and I've been trying to get Clutter to turn those frames into a flipbook. It took a lot of meditating over the documentation and some coaching from Clutter developers on IRC for me to get that doing animations this way wasn't going to be a misuse of the library -- but it would be something I'd have to write myself.

Clutter does ship with some data structures you can use for displaying images, so that's what I tried first. There are two: ClutterTexture, which is a GObject subtype of ClutterActor; and ClutterImage, which implements the ClutterContent interface and can thus be used to render ClutterActors that don't know how to render themselves. I saw some indication online that ClutterTexture was probably on its way out, so the first technique I used to implement my animation system was to have a handler for the "new-frame" signal on an animation-specific ClutterTimeline update the internal state of the animated actor's ClutterImage content with the pixel data for a new frame (via clutter_image_set_data). To my surprise, it actually worked (go Clutter!) but it also drove the CPU utilization on my netbook up to about 60%. Here's why: Since Clutter gets its drawing done via OpenGL (actually via Cogl, a GL/GLES compatibility layer), before you can display an image to the framebuffer, Clutter wants to get a handle to a CoglTexture, a block of allocated (and filled-out) GPU texture memory. Pushing textures to the GPU is expensive, so doing it every frame is pretty much a non-starter.

The next thing I tried was maintaining a roster of ClutterImage objects, each with static, pre-uploaded data for a single frame of animation, and updating the content of my sprite actor with the requisite image (via clutter_actor_set_content) on every new frame. This was an improvement, but only got me down to about 40% utilization. I think the problem with this approach might have been that updating the content at the actor level triggers an invalidation that forces some unnecessary invalidations of the scene graph that are expensive to recalculate, but I didn't investigate deeply.

At the suggestion of someone on IRC, the next thing I tried was creating my own implementation of the ClutterContent interface such that the paint_content function would use a pre-uploaded Cogl texture to paint each required frame. The paint_content function has the prototype:

void (*) (ClutterContent *, ClutterActor *, ClutterPaintNode *);

...where ClutterPaintNode argument is a local root of the render graph that corresponds to the actor to which the content is attached (content can be shared across multiple actors). To get your content painted, you need to attach a child paint node to that root. The ClutterPaintNode interface doesn't expose any primitive drawing callbacks, so I don't think they want you to write your own, but there's a provided implementation (ClutterTexturePaintNode) that paints the contents of the texture you give it. I built my ClutterContent implementation around this paint node type -- and it worked, but I was still at around 30% CPU. When I profiled my application, I found that the invalidation signal was being fired much more frequently than the new-frame signal, and the default mechanism for handling invalidation -- which I wasn't sure I wanted to override -- queues a paint operation.

I was feeling a bit down in the mouth about the prospects of animating my sprite at 20% CPU or less. I started Googling various combinations of "clutter" and "animation," and somehow arrived at the Gitorious page for Rob Staudinger's clutter-sprite project, which promises obliquely to provide "A sprite actor for clutter." It didn't look like much, but I started rifling the source files to see if I could learn something. Sure enough, I found that Rob cleverly overrides ClutterActor's paint function and uses cogl_rectangle to paint the frame of animation directly from an image stored as a CoglMaterial, skirting the Clutter render tree entirely. Using a variation on his technique, I was able to get my utilization down to about 25%, which seems like it might be as low as I'm going to get with my netbook's GPU and this version of Clutter.

82 older entries...


Others have certified joolean as follows:

  • lerdsuwa certified joolean as Apprentice
  • badvogato certified joolean as Journeyer
  • mako certified joolean as Apprentice
  • aicra certified joolean as Apprentice
  • lkcl certified joolean as Apprentice
  • ara0bswft16 certified joolean as Apprentice
  • dangermaus certified joolean as Master

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page