29 May 2011 mhausenblas   » (Journeyer)

Ye shall not DELETE data!

This is the first post in the solving-tomorrow’s-problems-with-yesterday’s-tools series.

Alex Popescu recently reviewed a post by Mikayel Vardanyan on Picking the Right NoSQL Database Tool and was puzzled about the following of Mikayel’s statement:

[Relational database systems] allow versioning or activities like: Create, Read, Update and Delete. For databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and note duly the previous value for that record.

I don’t find it puzzling at all. As Pat Helland rightly says:

In large-scale systems, you don’t update data, you add new data or create a new version.

OK, I guess arguing this on an abstract level serves nobody. Let’s get our hands dirty and have a look at a concrete example. I pick an example from the Linked Data world, but there is nothing really specific to it – it just happens to be the data language I speak and dream in ;)

Look at the following piece of data:

… and now let’s capture the fact that my address has changed …

This looks normal at first sight, but there are two drawbacks attached with it:

  1. If I ask the question: ‘Where has Michael been living previously?’, I can’t get an answer anymore once the update has been performed, unless I have a local copy of the old data piece.
  2. Whenever I ask the question: ‘Where does Michael live?’ I need to implicitly add ‘at the moment’, as the information is not scoped.

There are few ways one can deal with it, though. And as a consequence, here is what I demand:

  • Never ever DELETE data – it’s slow and lossy; also updating data is not good, as UPDATE is essentially DELETE + INSERT and hence lossy as well.
  • Each piece of data must be versioned – in the Linked Data world one could, for example, use quads rather than triples to capture the context of the assertion expressed in the data.

Oh, BTW, my dear colleagues from the SPARQL Working Group – having said this, I think SPARQL Update is heading in the wrong direction. Can we still change this, pretty please?

PS: disk space is cheap these days, as nicely pointed out by Dorian Taylor ;)


Filed under: Big Data, Cloud Computing, Linked Data, NoSQL, Proposal, W3C

Syndicated 2011-05-29 09:07:39 from Web of Data

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!