28 Nov 2004 mdupont   » (Master)

I have now been able to get the introspector perl scripts to run on the output of rdfproc, a part of redland. All you need to use this now are just the redland, and there are debian packages for them. You can use many tools on this rdf, take a look at http://librdf.org for more information

You are going to want these packages for debian. librdf-perl - Perl language bindings for the Redland RDF library librdf0 - Redland RDF Application Framework librdf0-dev - Redland RDF library development libraries and headers libraptor1 - Raptor RDF Parser library libraptor1-dev - Raptor RDF parser and serializer development libraries and headers

Here are some good example data files : c-dump ntriples rdfxml example

These are two forms of rdf, ntriple and rdf/xml. You can use them with the introspector like this, example given with the ntriples :

1. gunzip the file gunzip c-dump.rdf.gz

2. make a redland repository rdfproc Global parse ntriples file:/ The Global is the name of the repository file:/ is the base address that can be what ever uri you want

That will create a repository in the current directory using berkleydb 6.2M Global-po2s.db -- predicate object index (used to find by field) 9.0M Global-so2p.db -- subject -object index (not used) 9.5M Global-sp2o.db -- subeject predicate index (graph traversal) 25M total

So you have about 9mb of indexes for a 500k zipped ntriples file.

The unpacked sizes are here : 13M Nov 28 15:34 c-dump.rdf 4.7M Nov 28 15:34 c-dump.ntriples

wc(wordcount) on c-dump.ntriples gives lines 96,818, words 387,292, chars 4,846,776

The original source file (expanded with headers) lines 13,270 words 27,221 chars 260,051(254K from ls) c-dump.i

So we are talking about 10x increase in size for indexing.

For example, i have installed the introspector into my home dir : /home/mdupont/EXPERIMENTS/introspector/introspector-0.7 The cvs version is up to date, You can download the release here from sf.net

so, to use it Go to the directory containing the rdf database files perl -I/home/mdupont/EXPERIMENTS/introspector/introspector-0.7 ~/EXPERIMENTS/introspector/introspector-0.7/recurse5.pl node_types:function_decl file:/

the node_types:function_decl is the node types that i am looking for, other interesting ones can be found in the Introspector/GCCTypes.pm file.

I hope that you take some time and play around with the introspector. It is not running perfect, but fast!

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!