Name: Ben Martin
Member since: 2001-11-04 13:15:24
Last Login: N/A
Homepage: http://witme.sourceforge.net/libferris.web/
Notes: Save Ferris!
libferris is the virtual filesystem / semantic data manager. If you want to mount XML, Evolution, Firefox, PostgreSQL as a filesystem then its all been done before ;) Ego is the file manager / data interaction tool.
An interesting experament, (though truely a sad result), I have 90+ freshmeat subscribers to the libferris project. Assuming that half of them are starving students on peasent level one income and that I'm subscribed to the project as well this allows 40+ folks who could possibly make a $10-$20 donation to the libferris project.
Now surely that money is not much and is laughable in comparison to what an experienced C++ coder with similar qualifications to myself could earn commercially on closed source but relatively speaking the potential $500/yr that a small donation from half my subscriber base would make to my current situation would be great. It also gives some warn fuzzyness that folks are getting what libferris is about.
/me waves money pan.
See the top of this page for a screen animation.
I noticed on planet KDE this post on desktop search. It mentioned not using xattr for metadata because some filesystems don't support it. I'd say that most filesystems don't, iso filesystems, NFS (depending on setup), http, ftp and xemacs don't. The simple solution to all this in libferris is to virtualise xattr just like the filesytem itself is virtual. So you store xattr in RDF when the underlying filesystem doesn't allow it.
I should also highlight that the tagging mentioned in the post referenced by the above post is already available and usable with libferris :) You can attach arbitrary metadata to virtual filesystem objects, index them and search based on that metadata. Indexing can be done in many formats, lucene, postgresql, RDF using redland (db4, sqlite, postgresql) or on an LDAP server.
Yes folks, its true, you can now mount libferris through the kernel using fuse. The goodness of your xemacs session becoming a kernel filesystem, mounting firefox through ferris and fuse... mmm, filesystems ease the pain.
Still trying to get some more advanced article about libferris usage out there. Things are starting to get rather interesting now because of the stacked filesystems in libferris and ultimate exposure through fuse lets you do some rather funky things with data that comes from (and returns to) many and varied places.
Recently a question was posed to me in which I tended to offer a reasonably off the cuff response for. This led to an interesting debate about if set<string> was going to be hugely slower than hash_set<string> for the exact case where hash_set<> should whip an AVL tree's butt: direct lookups.
So without going into that conversation I decided to benchmark the two std::collections from both stdc++ and stlport 4.x. This is using gcc 4.0.2 which is shameful as I should have a more recent gcc. I'll likely rereun it on icc and 4.1.x as well.
The core of the code is to read strings from cin and shove them into a std::list. During the set<> parts I create a set with the list (which will have dups) and then iterate the list 50 times looking for each entry (including dups again) in the built set<> or hash_set<>.
There is of course some cruft there to select the right container from stdc++ and stlport because hash_set is non standard.
if( use_hash )
{
l_t::iterator e = l.end();
for( l_t::iterator iter = l.begin(); iter != e; ++iter )
hstrset.insert( *iter );
for( int i=0; i<LOOKUPS; ++i )
for( l_t::iterator iter = l.begin(); iter != e; ++iter )
hstrset.find( *iter );
}
else
{
l_t::iterator e = l.end();
for( l_t::iterator iter = l.begin(); iter != e; ++iter )
strset.insert( *iter );
for( int i=0; i<LOOKUPS; ++i )
for( l_t::iterator iter = l.begin(); iter != e; ++iter )
strset.find( *iter );
}
So the benchmarks, all compiled with -O9. Other gcc options don't seem to make any real effect. I created input from Gutenberg files, l.size is the number of words read. The hash_set methods are quicker for the completely degenerate case of only doing direct lookups and doing each of them at least 50 times per uniq word in the input.
Perhaps the most interesting point is the difference in speed between stlport and libstdc++ for this. I am now very interested to see how stlport5.x compares.
# Using stdc++::set<> foo$ time cat /tmp/largetxt.txt | ./string_xset l.size:273435 use_hash:0
real 0m16.980s user 0m16.493s sys 0m0.028s
# Using stlport::set<> foo$ time cat /tmp/largetxt.txt | ./string_xset_stlport l.size:273435 use_hash:0
real 0m10.184s user 0m9.821s sys 0m0.084s
# Using stdc++::hash_set<> foo$ time cat /tmp/largetxt.txt | ./string_xset 1 l.size:273435 use_hash:1
real 0m4.061s user 0m3.868s sys 0m0.024s
# Using stlport::hash_set<> foo$ time cat /tmp/largetxt.txt | ./string_xset_stlport 1 l.size:273435 use_hash:1
real 0m2.430s user 0m2.328s sys 0m0.012s
monkeyiq certified others as follows:
Others have certified monkeyiq as follows:
[ Certification disabled because you're not logged in. ]
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!