23 Jul 2009 company   » (Master)

semantic desktop

I named this post “Tracker” first as I started writing from that perspective, but the problems I’m about to talk are more related to what is called “semantic desktop” and not specific to Tracker, which is just the GNOME implementation to that idea.
This post is a collection of my thoughts on this whole topic. What I originally wanted to do was improve Epiphany’s history handling. Epiphany still deletes your history after 10 days for performance reasons. When people suggesting Tracker I started investigating it, both for this purpose and in general.

How did this all start?

It gained traction when people realized that a lot of places on the desktop refer to the same things, but they all do it incompatibly. (me, me, me, me, me and I might be in your IRC, mail program and feed reader, too.) So they set out to change it. Unfortunately, such a change would have required changes to almost all applications. And that is hard. An easier approach is to just index all files and collect the information from them without touchig the applications. And thus, Tracker and Beagle were born and competed on doing just that.
However, indexing has lots of problems. Not only do you need to support all those ever-changing file formats, you also need to do the indexing. And that takes lots of CPU and IO and is duplicated work and storage. So the idea was born to instead write plugins that grab the information from the applications while they are running.
But still, people weren’t convinced, as the only things they got from this is search tools, even if they automatically update. And their data is still duplicated.

What’s a sematic desktop anyway?

Well, it’s actually quite smple. It’s just a bunch of statements in the form <subject> <predicate> <object> (called triples), like “Evolution sends emails”. Those statements come complete with a huge spec and lots of buzzwords, but it’s always just about <subject> <predicate> <object>.
Unfortunately, statements don’t help you a whole lot, there’s a huge difference between “Evolution sends emails” and “I send emails”. You need a dictionary (called ontology). The one used by Tracker is the Nepomuk ontology.
And when you have stored lots of triples stored according to your ontologies, then you can query them (using SPARQL). See Philip’s posts (1, 2) for an intro.

So why is that awesome?

If all your data is stored this way, you can easily access information from other applications without having to parse their formats. And you can easily talk to the other applications about that data. So you can have a button in evolution for IM or one in empathy to send emails. Implementing something like Wave should be kinda trivial.
And of course, you get an awesome search.

No downsides?

Of course there are downsides. For a start, one has to agree on the ontologies: How should all the data be represented? (Do we use a model like OOXML, ODF or HTML for storing documents?) Then there also is the question about security. (Should all apps be able to get all passwords”? Or should everyone be able to delete all data?) It’s definitely not easy to get right.
How does Tracker help?

Tracker tries to solve 2 problems: It tries to supply a storage and query daemon for all the data (called a triple-store) and it tries to solve the infrastructure to indexing files. The storage backend makes sense. Its architecture is sound, it’s fast and you can send useful queries its way. It has a crew of coders developing it that know their stuff. So yes, it’s the thing you want to use. Unless you don’t buy in to the semantic desktop hype.

What about the indexing?

Well, the whole idea of indexing the files on my computer is problematic. The biggest problem I have is thatthe Tracker people lack the expertise to know what data to index and how. It doesn’t help a whole lot if Tracker parses all JPEG files in the Pitures/ folder when the real data is stored in F-Spot. It doesn’t help a whole lot when you have Empathy and Evolution plugins that both have a contact named Kelly Hildebrand, but you don’t know if they’re talking about the same person. You just end up with a bunch of unrelated data.

There was something about hype?

Yeah, the semantic desktop has been an ongoing hype for a while without showing great results. Google Desktop is the best example, Beagle was supposed to be awesome but hasn’t had a release for a while, let alone Dashboard, and it hasn’t caught on in GNOME, either, even though we talk about it for more than 3 years.
But then, Nokia still builds on Tracker for Harmattan, the Zeitgeist team tries to collect data from multiple applications and make use of it in innovative ways. People are definitely still trying. But it’s not clear to me that anyone has figured out a way to make use of it yet.

Now, do I want to use it in my application or not?

Tracker is not up to the quality standards people are used from GNOME software. It’s an exciting and rapidly changing code base with lots of downright idiotic behaviors - like crashing when it accidentally takes more than 80MB memory while parsing a large file - and unsolved problems. Some parts don’t compile, the API is practically not documented and the dependancy list is not small (at least if you wanna hack on it). It also ships a tool to delete your database. Which is nice for debugging, but somewhat like shipping a tool that does rm -rf ~. In summary, I feel remembered of the GStreamer 0.7 or Gtk 1.3 days. Products with solid foundations, a potentially bright future ahead but not there yet. So it’s at best beta quality. I’d call it alpha.
There is an active development team, but that team is focused on the next Maemo release and not on desktop integration. This is quite important, because it likely means that the development focus will probably only be on new applications for the Maemo platform and not on porting old applications (in particular GNOME ones) to use Tracker. And that in turn means there will not be deeper desktop integration. Unless someone comes up and works on it.

So I don’t want to use Tracker?

The idea of the semantic desktop has great potential. if every application makes its data available for every other application in a common data store that everybody agrees on, you can get very nice integration of that data. But that requires that it’s not treated as an add-on that crawls the desktop itself, but that applications start using Tracker as their exclusive primary data store. Until EDS is just a compatibility frontend for Tracker, it’s not there yet.
So if you use Tracker, you will not have to port yor application to use it in the future, when GNOME requires it. You also get rid of the Save button in your application and gain automatic backup, crash recovery and full text search. But if you don’t use Tracker, you don’t save your data on unfinished software, and you don’t have to rip it out when Nokia (or whoever) figures out that Tracker is not the future.

Conclusion

I have no idea if Tracker or the semantic desktop is the right idea. I don’t even know if it is the right idea for a new Epiphany history backend. It’s probably roughly the same amount of work I have to do in both cases. But I’m worried about its (self)perception as an add-on instead of as an integral part of every application.

Syndicated 2009-07-23 19:34:55 from Swfblag

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!