8 Dec 2004 titus   » (Journeyer)

The only problem with troubleshooting is that trouble sometimes shoots back. -- Joe Zeff.

I've been noticing a fair amount of commentary on Python and Java lately: I particularly enjoyed Bruce Eckel's take on Static vs Dynamic typing, and Phillip Eby's Python Is Not Java (and Java Is Not Python, either). Phillip Eby makes the point that the Python and Java mindsets are quite different when it comes to frameworks: Python programmers tend to develop the structure out as they need it, while Java designers try to specify the frameworks' structure first & then fill in with specific implementations. Isn't this antithetical to the agile programming paradigm that's been gaining popularity lately?

Jython does a nice job of mingling Java libraries with Python coding; I think many of the Python-native extension modules can be loaded directly by Jython, too. Is this a possible solution to the question of static vs dynamic typing -- build your software in a language like Jython, and then slowly solidify it into Java?

I primarily do research programming, in which the specific goals of the software are largely undefined & the flexibility of the code should be one of the proximal design considerations, so I definitely prefer the Python(/Perl/Ruby) mindset in day-to-day work. There is a question in my mind, though, about where future bioinformatics software efforts will aim: I doubt that the current loosely-coupled/badly-specified project-specific protocols for genome databases and service frameworks will last, so where next? We could either start developing specifications (e.g. the distributed annotation system (DAS) or MAGE) or implementations (e.g. GMOD). If the former, there will be a significant barrier to entry for new projects, as they will need to spend time developing to the standard and confirming adherence. (This is the primary reason why DAS is a failure, I think.) If the latter, I predict a general tendency towards complexity of internal design as different projects try to cram all their needs into a single system. Either situation would be bad.

My preference is for what I think is a middle ground: the development of APIs around common tasks, in a variety of languages. The idea would be to take protocols like DAS and provide fairly simple library implementations that give you 90% of the needed functionality with 10% of the code complexity (based on the well known 90%/10% rule ;). The key is to make sure the implementations work well enough to do something useful & are in enough languages that e.g. the lone maverick Python/OCaml/Ruby programmer in the sea of Perl & Java programmers wants to play as well (just as one example!).

At the moment there are few tasks generic enough to be encapsulated by such an approach: the two that I can think of are annotation & microarray data presentation. Annotation suffers from a general lack of interoperability: not only does everyone have their own standards, but features don't transfer well between standards. I hear microarray data is the same, although I don't work with it much. It'd be interesting to try to work around the ontology problems (do you *really* want to define an ontology before getting your work done!?) to produce a genuinely useful annotation UI that interoperates. I don't see one out there that's usable by "mere" biologists, and I think that's the right target audience...

Why not use, say, XML? Well, properly grokking XML is burdensome and the whole process is pretty legalistic (lots of people yakking etc.). Since the goal is to lower ease of entry I think it's important to have some functioning libraries as soon as possible -- that way people can get the thrill of having the code actually work. When the library moves towards a standard, projects that are already functioning will at least have some reason to move with that library...

Hats off to the Chinook folks, who are developing a P2P bioinformatics system; you can access the code via CVS, finally.


Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!