Dear lazyweb number 3.
So far, I've asked:
high latency net simulations - great answers.
python friendly back-end accessible search engines - many answers, none that fit the bill. So I wrote my own :).
Today, I shall ask - is there a python-accessible persistent b+tree(or hashtable, or ...) module. Key considerations:
- scaling: millions of nodes are needed with low latency access to a nodes value and to determine a nodes absence
- indices are write once. (e.g. a group of indices are queried, and data is expired altered by some generational tactic such as combining existing indices into one larger one and discarding the old ones)
- reading and writing is suitable for sharply memory constrained environments. ideally only a few 100KB of memory are needed to write a 100K node index, or to read those same 100K nodes back out of a million node index. temporary files during writing are fine.
- backend access must either be via a well defined minimal api (e.g. 'needs read, readv, write, rename, delete') or customisable in python
- easy installation - if C libraries etc are needed they must be already pervasively available to windows users and Ubuntu/Suse/Redhat/*BSD systems
- ideally sorted iteration is available as well, though it could be layered on top
- fast, did I mention fast?
- stable formats - these indices may last for years unaltered after being written, so any libraries involved need to ensure that the format will be accessible for a long time. (e.g. python's dump/marshal facility fails)
sqlite, bdb already fail at this requirements list.
snakesql, gadfly, buzhug and rbtree fail too.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!