Genend Update 2.33421
Still having problems loading full data sets into memory for Bacteria + Archaea genomes. Need to come up with a good way to do this with the 67/80/90% runs. Right now, I can only do it with Archaea.
The results for the run strike me as being somewhat odd. You’ll see below…
Despite having gone over the algorithm repeatedly I’ve been unable to find a fault in it. As near as I can tell its doing exactly what I thought it should be. I thought it was odd that the results for 3-6mers are about the same despite training more or less (training 50% showed almost identical results as well). The oddest thing is that the results drop off after peaking at either 6-mer or 7-mer. Thats the part that makes no sense to me. I’m not sure what to make of it.
Maybe I’m missing something obvious. I’ll switch to something else for a bit and come back to it.