Older blog entries for rufius (starting at number 40)

More results, new and improved software

Switched from Python to Java (sigh) to improve speed. Python (besides being dynamic) has worthless threading in comparison to Java. Java version is faster, by a lot, runs on just Archaea go from 4 days to something in the range of 12 hours with Archaea + Bacteria (625 genomes).

Ran into problems with threading but learned a lot while doing it. Will definitely make this process faster. Runs are still a slow process though but not much room for improvement without moving to a cluster.

Loading full relative distributions for 9-mers is currently not possible right now without more consideration of the program. Maybe switching from Hashmap<String, Double> to an array of doubles (double[]) will save some space. Need to investigate that further, we’ll see.

Results for piece sizes 36, 100, and 200 (excluding 8-mers) for 3 through 8-mers as follows:

Still waiting on a run to finish for 200 piece size 8-mers, then will run taxonomic classifier. Unsure of how well that will go in efforts to match data in db to data from files. Not sure the “species” match up between the two in the right way.

Syndicated 2008-11-06 17:24:29 from blog.zacbrown.org - just run away, now.

Old data bad, New data good, Program too slow

So the last set of data posted is definitely incorrect. Found flaws in the scripts’ function to generate relative distributions. Also modified the original identification script to work with classifying organisms.

The data for correct identification below…

The data for phylogenetic classification below…

Full bacterial and bacterial+archaeal analysis will be harder as the current program is too slow. Rewriting parts to make the process faster. Possibly working OCaml to do this.

Syndicated 2008-10-16 17:43:11 from blog.zacbrown.org - just run away, now.

More genomics…

After some misunderstanding, now have a program that does what is needed. Seems slow and memory constraints on loading higher level distributions is difficult (kmer size > 9).

Started a run last night(~18:00) on 625 genomes (50 Archaea, 525 Bacteria), still running. Got no significant results from 3-5-mers, now running on 6-9-mers.

Have a completed run from just doing Archaea, results not so great, around 1.1-1.6% success in identification with 10000 samplings. See graph below:

Syndicated 2008-10-02 18:00:59 from blog.zacbrown.org - just run away, now.

Genend - Update 1

Moved from Perl to Python. Extensive use of Perl in larger files proved to be hard to organize for myself, was having trouble keeping straight what I was doing. Also don’t like the Perl object/class system, more at home with Python’s.

Current progress includes a custom database object for use with interfacing to a sqlite database (and possibly PostgreSQL/MySQL/Firebird if it gets too slow). Everything except ‘updates’ to an entry are done. Database object is about as simple as it gets, using a list of tuples for adding k-mer’s and a large tuple for taxonomy.

Started working on an object that will take in a directory full of genomes, the output directory and a number for the number of threads to run and it will pool objects to process files. Will have a threadable object that accesses BioPython libraries to parse the genome files. Important question for queueing threads is whether SQLite will like concurrent access to the same database. Need to figure out how to handle inserts so that there isn’t fragmentation. There should be little fragmentation as each file and species will be unique.

For next week:

Finish up database object and threading objects. Do preliminary run to start building genomes. Determine largest feasible genome before laptop machine (2×2.4GHz w/ 4 GB ram) will puke. If it proves to do so before getting to high in the phlya, will need to start writing some string operating libraries in C to deal with static length strings.

Syndicated 2008-09-18 18:19:16 from blog.zacbrown.org - just run away, now.


Just got back from LA. My stint with Google is over :(. It was a great experience despite my awful apartment.

Managed to get myself some confidence in my abilities and am now working on two OSS projects.

Back to the grind of school now. At least I’ll be starting a new research project.

Syndicated 2008-08-19 14:11:54 from blog.zacbrown.org - just run away, now.

Oppose the Orphaned Works Act of 2008!

I won’t go into full detail of its evil here, as you can read and get the gist of it here: http://blamcast.net/articles/orphaned-works-open-source-copyright.

Essentially, it means any company can take a piece of software (among other things) and “claim” they looked for the author and then use it without obeying the GPL license on it. When the copyright holder sues on grounds of infringement, the people that violated the copyright merely have to provide “proof” that they looked for an author and could not locate one.

The only way to really prevent the infringement if the bill passes, is to register with some sort of copyright registry, which costs money no doubt.

Oppose it! You can go here: http://capwiz.com/illustratorspartnership/home/ to find out how to easily write your congressman/woman.

Syndicated 2008-07-05 17:05:05 from blog.zacbrown.org - just run away, now.

Do something nice today.

To be honest, I’m long overdue on saying something. I am alive, just barely, but I am.

I came across this article today. It made me realize that society has become disenchanted with… well society itself. Do something nice today for someone you don’t even know. Seriously.


Syndicated 2008-03-28 22:14:09 from blog.zacbrown.org - just run away, now.


Long two week, many tests, projects and homework to be done. 

With that, I leave you with some food for thought: “When the rich wage war its the poor who die.” - Mike Shinoda 

Syndicated 2008-02-27 17:15:27 from blog.zacbrown.org - just run away, now.

the dark arts

I am now as we speak ever approaching the dark arts. Actually I’m just going to be working on compilers and filesystems for the next two years in a directed study with a professor in the Computer Engineering department. A friend in IEEE needs help with his senior project so that’ll be the beginning of my learning about compilers. Filesystems shall follow shortly thereafter.

Should be fun and challenging, seeing as I get frustrated to no end with the obsession of online frameworks and building social networking sites. Not that I think those are bad or trivial things, I just don’t find them to be particularly interesting problems. Even if compilers and filesystems are no longer “cool” to the academic types, I still think they are.

And now I’m off to read more of “Functional Programming: Application & Implementation” by Henderson. If anyone actually reads this, stay tuned. A little while from now I may be posting a Lispkit implementation with a compiler that builds native code which I know is a dirty dirty thing and I should have my mouth washed out with soap for it but I don’t care :).

PS: More wine fun is coming as well :) Hooray for shell32!

Syndicated 2008-02-01 17:26:47 from blog.zacbrown.org - just run away, now.

21 Dec 2007 (updated 10 Jan 2008 at 01:38 UTC) »

cheers & jeers


So I found out a couple evenings ago that I got the internship with Google. I would start sometime in mid-May and finish up sometime in mid-August. It would be good fun though it is mildly disappointing that my friend (Danny) that works for Google wouldn't be there during the time I'd be there. He’s finishing up his last semester of college and wwill be arriving there about the time I would be leave.

I’m excited and anxious all at the same time, dealing with getting myself a place to sleep will be interesting. Maybe I can just sleep under my desk and shower at the office?


I was less than pleased with my grades this semester. I ended up with a B+ in both Algorithms & Software Eng. This could change though for Algorithms as I emailed the professor and he said we could speak about possibly getting me an A- which would be great.

Software Engineering on the other hand, I apparently bombed the final. That puzzles me but I’m going to meet with the professor. Surely I didn’t do that poorly? I’m not a bad student…


Anyway, back to programming on wine

Syndicated 2007-12-21 02:03:20 from blog.zacbrown.org - just run away, now.

31 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!