8 Jul 2010 ingvar   » (Master)

I was reading titus's recent post about data structures for k-mer filtering.

ONe thing struct me. Titus, you looked at binary Bloom filters, but seemingly ignored Counting Bloom filters (where instead of indexing a bit with your hashes, you index a counter of a suitable size, you do run into problems if you need both increments and decrements, since once you it max on the counter, you can no longer decrement).

Of course, it's not necessarily trivial going from a Bloom filter (counting or not) to the key, whereas the key is typically available when using a normal hash table.

I suspect, but have not verified, that a counting Bloom combined with the parallelization opportunity you found (and that one is the big win, I think) would be an equally interesting approach.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!