Older blog entries for welisc (starting at number 38)

11 Apr 2003 (updated 7 May 2003 at 12:11 UTC) »
SuMO is a new server for finding ligand-binding sites in proteins. It's the first example I've seen of a bioinformatics application written in OCaml.

Been trying out Jeff Schrager's tutorial Intelligent Computational Biology in BioLisp.

Avoiding thesis work by writing an Eiffel version of Goldberg's Simple Genetic Algorithm. Appears to work but performance is disappointing: about half the speed of the C version. Turning off GC makes little difference.

6 Feb 2002 (updated 22 Jan 2003 at 12:44 UTC) »
OCaml and scsh

chalst suggests scsh as an alternative to OCaml. My main interest in OCaml is as a language that I can write can code in as quickly as in a scripting language and compile to get high performance if necessary. (My real interest is in modelling protein structures which needs every drop of performance I can get). That said, I like the design philosophy behind scsh and the idea of embedding domain-specific little languages.

The only real reasons I prefer OCaml over Scheme is that it's statically typed and it has a reputation for being very fast. I'm not sure these are good reasons. Scheme programmers seem to manage just fine without static typing and Brad Lucier showed that, using Gambit-C, you could get performance equivalent to C from Scheme code for number-crunching PDEs.

I think I'm a Scheme programmer at heart but for mercenary reasons I've learning to live with C++ and trying to ignore my suspicion that using Scheme would help me to "beat the averages".

Invested in a copy of The C++ Programming Language last weekend.

My other language is OCaml.

Tinkering with OCaml. An example in Chapitre 12 of the O'Reilly OCaml book shows how blocks allocated in external C code using malloc() can be reclaimed automatically by the OCaml GC using a finalisation function that calls free(). This would be very convenient for writing an OCaml interface to GSL.

Working on extending the loop modelling program. Had a brief flirtation with doing it in Eiffel but decided to switch to C++ for mercenary reasons. Suspect that if I'd done it in Eiffel, I'd have a working program by now instead of being bogged down in the myriad details of C++.

13 Dec 2001 (updated 9 Jan 2002 at 20:09 UTC) »
pphaneuf reports that the physicists he works with don't seem to understand how to use MPI efficiently or why passing around gigabyte-sized structures instead of pointers is a bad idea. They might be confused by the difference between Fortran's call-by-reference convention and C's call-by-value convention. In any case, this is the kind of really basic stuff that scientists who program should know.

Made my first foray into the world of GUIs with a Tcl script to build an interface to MolScript that partially automates the tedious cycle of writing a script, feeding it to MolScript and then reloading the image file. It's really nothing more than a glorified text editor but it does the job.

Installed Mozilla 0.9.6. It seems that they fixed the problem that resulted in a crash when using the Back button to move backwards within www.oreillynet.com. It still doesn't display the bookmarklet link on the Blogger Settings page.

12 Dec 2001 (updated 12 Dec 2001 at 21:19 UTC) »
AlanShutko thinks that scientists who write software need to learn computer science or they'll end up writing buggy, inefficient code. I completely agree, which is why I've been making an effort to learn at least the basic principles of software engineering and algorithms. Bioinformatics is dealing with some of the most complex systems ever discovered and the software to handle that is going to have be complex and very efficient. If you're doing bioinformatics you have to learn computer science and it is by no means easy, although it is rewarding.
12 Dec 2001 (updated 9 Jan 2002 at 20:11 UTC) »

There's an interesting interview with Lincoln Stein on perl.com. He says that computer scientists find it much harder to learn biology than biologists do to learn computer science because computer scientists need to learn a new paradigm while biologists are just picking up another skill.

He makes it sound so easy. I tell you it is not.

tk pointed me to Psyco, a compiler designed to execute Python at near the speed of compiled languages. Erann Gat proposed Lisp as an Alternative to Java; Psyco offers the prospect of being able to propose Python as an alternative to C but it'd be worthwhile even it only allowed Python to be substituted for Java . Python is attractive because the Pmv project is using Python to develop components for structural bioinformatics.

11 Dec 2001 (updated 9 Jan 2002 at 20:14 UTC) »

It's occurred to me - after a mere four years - that I've spent much more time writing code for extracting propensity tables and loop modelling than I have actually running it. Since the output of the programs is what I'm really interested in, it's obvious that I should be trying to minimise the development time.

Using Perl or Python would certainly reduce development times but at a cost in terms of performance. However, if it saves a lot of development time, this might actually offset the increased running time to the extent of reducing the overall time to get results. And there's always the option of using SWIG to drop down into C for the heavy-lifting bits.

This of course still means having to write the heavy-lifting bits in C or C++.

I really like functional languages: they let you write high-level code and write it quickly, and then compile it to get optimal performance. The problem with functional languages is that the paradigm is very different to imperative programming so there might be problems persuading coworkers that they're a good idea. There's also the perennial problem of people not wanting to use lesser-used languages that add little or no value to a CV.

I suspect I'm going to end up with a compromise solution such as gcj-compiled Java, a language whose only selling point for me is that it eliminates a lot of the complexity of C++ (while also eliminating some of the good features of C++).

29 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!