Older blog entries for hereticmessiah (starting at number 9)

Thanks to Bram for his comments on the diffing algorithm. Much appreciated.

28 Feb 2004 (updated 1 Mar 2004 at 18:25 UTC) »

Am I an idiot or what! There were a few stupid errors in that diffing algorithm I threw up (give me a break: they don’t have python installed here in college). Here’s a corrected version. Again, any comments are welcome.

def GetDiff(oldLines, newLines):
    """Constructs a diff to get from newlines to oldlines."""
 
    # Construct our special "hash" table. Using primes for the table length
    # would be better, I know, but odd numbers are simpler.
    tblSize = len(newLines) * 2 + 1
    tbl     = tuple([] * tblSize)
    for iNew in range(len(newLines)):
        iBucket = hash(newLines[iNew]) % tblSize
        tbl[iBucket].append(iNew)
 
    # Holds our diff information.
    result = []
 
    # Find the common subsequences and differences.
    iOld = 0
    iMax = 0
    while iOld < len(oldLines):
        iBucket = hash(oldLines[iOld]) % tblSize
 
        # Start with no match
        nMax = 0
        for iNew in tbl[iBucket]:
            # if nMax > 0 and iNew < iMax + nMax:
            #     continue
            nCur = 0
            try:
                while newLines[iNew + nCur] == oldLines[iOld + nCur]:
                    nCur += 1
            if nCur > nMax:
                nMax = nCur
                iMax = iNew
                iEnd = iMax + nMax
                if iEnd == len(oldLines) or iEnd == len(newLines):
                    break
 
        # Have we found a matching subsequence?
        if nMax == 0:
            # Nope.
            result.append(oldLines[iOld])
            iOld += 1
        else:
            # Woohoo!
            result.append((iMax, nMax))
            iOld += nMax
 
     return result

There old version is in my last entry.

Part of my final year project is a version control system. It’s not the most important part, but it is important.

I searched about for delta compression algorithms, but all of them were either inappropriate or would take too long for me to implement (not to mention overkill for a college project!). So I came up with my own line-oriented one. Last night I wrote a quick implementation of the algorithm out on paper in python. I haven’t tested it out yet, so I’m not sure about its performance. The end product will be written in C++.

def GetDiff(oldLines, newLines):
    """Constructs a diff to get from newlines to oldlines."""

# Construct our special "hash" table. tblSize = len(newLines) * 2 + 1 tbl = tuple([[] for i in range(tblSize)]) for i in range(len(newLines)): pos = hash(newLines[i]) mod tblSize tbl[pos].append(i)

# Holds our diff information. result = []

# Find the common subsequences and differences. iOld = 0 iMax = 0 while iOld < len(oldLines): pos = hash(oldLines[iOld]) mod tblSize

# Start with no match nMax = 0 for iNew in tbl[pos]: nCur = 0 while nCur + iNew < len(newLines) and \ newLines[nCur + iNew] == oldLines[nCur + iOld]: nCur = nCur + 1 if nCur > nMax: nMax = nCur iMax = iNew

# Have we found a matching subsequence? if nMax == 0: # Nope. result.append(oldLines[iOld]) iOld = iOld + 1 else: # Woohoo! result.append((iMax, nMax)) iOld = iOld + nMax

return result

If anybody has any thoughts on it, I’d love to know.

Reading two books on Swarm Intelligence: Swarm Intelligence and Swarm Intelligence: From Natural to Artificial Systems. Zzzz...

9 Jul 2003 (updated 9 Jul 2003 at 15:03 UTC) »

Got a H1 overall in my third-year finals (which is about as good as you can do), but I have to repeat Computer Science ’cause I screwed that exam up completely.

Feeling rather depressed right now, but for other reasons besides my exams. All personal, all chemical (I’m unipolar, y’see). :-(

On the other hand, talideon.com is up and running again, which is something deserving of celebration. I must cheer up...

Building a new network stack for the WIMU basestation and nodes. Is it just me, or do electronic engineers have some sort of mental block against coding properly?

16 Jun 2003 (updated 16 Jun 2003 at 10:14 UTC) »

This week I’m working on getting one of the WIMU units here to talk to a base station via an RF link. It’s a chance to try out the radio network stack I’m working on at last! Also developing a GUI for it, so it’s time to brush up on my Tcl!

I took a look at Eric Kidd’s XML-RPC for C and C++ library, but I think I’m just going to code the diary/.plan file synchroniser with sockets myself. It’s too awkward to set up the Expat and the XML-RPC libraries here with the administrative restrictions on the Solaris boxen here at work. I hate reinventing the wheel, but seeing as all I need is a little cog and not F1 slicks, it’s probably for the best.

Ok, here’s an idea that’s after popping into my head.

I keep a .plan file here in my Unix account at work. Mostly, it’s just to keep track of what I’m doing development-wise.

Now, I also have my advogato log, but it’d be more convenient for me to update my .plan file and use the XML-RPC Interface to update my Advogato log with it.

Hmmm... sounds like a good idea. Now all I have to do is find the time to do it!

Was bored earlier and converted my signature generator for mutt to C and added a couple of features. It’s available here.

5 Jun 2003 (updated 5 Jun 2003 at 14:21 UTC) »

Just in case you’re wondering, this diary tracks development work I’m doing at the moment. Right now, I’m doing work on TinyOS and NesC for the NMRC as part of my six-month internship for my degree.

I’m just after getting PWS, OpenWiki, Tcl, and MiKTeX set up on the Ambient Lab’s one and only Win98 box after a huge amount of trouble. Administrators suck.

The next bit of work I’m going to be doing on ncc will be to try to get it to run natively on Windows. There are good reasons (the engineers here find Cygwin a bit scary) and bad (the administrators) for doing the port. I’m building it using MinGW and creating a library to encapsulate the things ncc requires that Cygwin has and MinGW doesn’t. For a start, nesc-cpp.c needs a fair bit of work to get rid of the *nixisms within and that’s where I’m starting.

Those in the NMRC Ambient Systems group who might be reading this can find the wiki I set up at http://mailab-pc/wiki/.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!