Older blog entries for apm (starting at number 16)

Argh!!! I was in the middle of updating the Suneido website when the server started giving errors and I couldn't finish the update. Now stuff is "broken" and I can't fix it! DataPipe hosts our site, and for the most part they've been pretty good, but it's frustrating in this kind of situation to have to wait for their tech support to get around to looking at the problem. I'm tempted to set up our own server but I'm not sure our internet connection is up to it. We're in a research park so the bandwidth is shared. Can't really afford to get our own dedicated connection. Maybe someday.

I was feeling pretty good about things before this happened. The site has been up for over a year, and I've managed to average a release every month. And so far the releases have all been pretty fair quality, IMO.

Progress is always slower than I'd like, but I don't really feel like working more than the 60 or 70 hours a week I already put into it. Suneido has attracted a few regular small-scale contributors, but so far no one else has really got involved in a major way. Not that I had any naive notion that masses of open source developers would immediately flock to the project. "If you build it, they will come." doesn't really apply! There are a zillion open source projects and so far, obviously, Suneido hasn't convinced anyone it's worth major investments of time. Maybe that'll come, and maybe not. In the meantime, I'll keep plugging away. I just hope our website comes back!

Something that's been on my todo list for Suneido for a long time is cleaning up a few places in the client-server network code where there might be bad interaction with the TCP/IP Nagle algorithm. However, I'd never actually done any testing to see if it was a real problem.

I won't try to explain the Nagle algorithm in detail. I'd recommend "Effective TCP/IP Programmer" by Snader if you're interested. But in short, it's a standard technique used to improve the efficiency of TCP/IP. However, it assumes you alternate between send's and receive's, e.g. send a request, receive a response. If you don't follow this pattern, for example, you do two send's e.g. a header and then data, then the Nagle algorithm can slow you down to 200ms per send/receive, or only 5 "messages" per second - pretty slow. One "fix" is to simply disable Nagle, but then you lose the improvements it brings.

So, last night I fixed the code to combine multiple send's and this morning I did some quick benchmarks. On the client side, the places affected were output, update, and delete. Sure enough, with the old code I was only getting 5 outputs per second (ouch!) With the new code I'm getting 2500 outputs per second, or 500 times faster! Not bad for a few hours work! (NOTE: This only affects client-server operation across the network, not standalone use. Standalone I get about 8000 outputs per second.)

It just goes to show that knowledge is a powerful tool. If I hadn't read about this problem it could have been a long time before I figured it out. Of course, if I had any brains, I'd have made this change a long time ago!

Gave a slide show on our spring Shishapangma expedition. It's always a lot of work pulling a slide show together out of literally thousands of slides. Fun to look at them though. Some good pictures. We had a pretty good turn out and it seemed to go over well.

Still plugging away at Suneido. Activity has been a bit down this week, both downloads and the forum and mailing list. We just passed 2500 registered downloads (lots more unregistered). Had an inquiry from a Brazilian company wanting to put Suneido on their CD. We'd previously had a German computer magazine wanting to include Suneido on a CD but I'm not sure if it ever was.

I've been working on trying to finish Suneido's Version Control system. We've been using it in-house for probably 6 months, but I never seem to find time to finish it and document it. Maybe I'll be able to get some more done on it this weekend.

Should prepare a new release before the end of the month as well. And update CVS on SourceForge, although I don't think anyone's really using CVS yet.

One of our users was talking about deploying an application he'd written with Suneido. That would be a milestone. I don't think anyone has deployed any applications outside of our office. Hope it works well for him.

It seems like I'm spending more and more time doing "support" for Suneido. On the positive side, that means people are actually trying to use it. However, it also means I don't get as much done. A lot of the questions (but not all) would be answered by better documentation. I should be able to use some of my responses as a starting point for some documentation. I'm afraid I can't really get excited about writing documentation. I know it's worthwhile, and if I want Suneido to be successful, it's going to have to be done. But ... I'd rather be programming. I can't even really fantasize about someone else doing it for me, since there aren't many people who know it well enough (yet).

Did get a little bit done today on int64 support for the dll interface. It's still not perfect because Suneido numbers don't cover the full range that int64 does. At some point I should probably extend the range of Suneido numbers. Currently they use 4 "digits" where each digit is 4 decimal digits (0 - 9999), for a total of 16 decimal digits. To cover the full range of int64's I'd have to add one more "digit" to handle 20 decimal digits. Shouldn't be too hard, but unfortunately, some of the code assumes 4 "digits". And, obviously, I don't want to break something as basic as numbers. I do have pretty extensive automated unit tests though, so hopefully they'll catch most problems. It's not a big priority, but one of our users is trying to implement an interface to MySql and it uses 64 bit integers. He also wants float and double support in the dll interface. Shouldn't be too hard to add.

The big things I want to get done for the next release are the version control and the unit testing framework. Both are more or less written and we've been using them in- house for quite a while. But they need some cleaning up, polishing, completing AND documenting. There's that damn documentation again!

Time to bike home. Not even 6pm and it's already getting dark. That's the problem with living so far north. Of course, the long days in the summer are nice. At least it was warm today - the snow we got last week is mostly gone.

It's been a while since I've been on Advogato. I noticed her had put a link on Ward's Wiki to the Suneido project on Advogato. So I followed it and found he had updated it a bit - thanks Helmut!

Suneido has been keeping me busy. I set up Suneido on SourceForge - that was an interesting exercise. I'd been on SourceForge before, but never used it much, let alone set up and administered a project. It was pretty straightforward. There's quite a bit of documentation, some of it pretty good. But there are always holes in the documentation. I had a bit of a struggle setting up the necessary CVS tools on Windows, but finally got it working. A while back one of our contributors, Roberto Artigas Jr., had added simple language translation to Suneido and had supplied translation data for Spanish. So I posted a "job" on SourceForge for other translations. I didn't really expect much to come out of it. I'm afraid most postings asking for help don't produce much. But this time I actually got quite a few responses. I had three people offer to do Italian! The end result is that you can now run the Suneido IDE in Spanish, Italian, French, German, and Russian. Pretty neat.

We had a good trip to Shishapangma in the spring. Turned around 150 meters (of altitude) from the top due to time and weather. So close! But with the snow conditions we only averaged 50 meters per hour so the top was likely another 3 hours away. I think the success rate was pretty low this spring. A few people were making it up, but with extremely long hard summit days. Oh well, we had a good trip, no sickness or injury. We left Kathmandu as the leftists were stirring up trouble, and a few days later most of the royal family was murdered. We were glad we got out when we did. After the climb we spent a couple of weeks on beaches in Thailand. It was nice R&R after 5 weeks on the mountain. It was amazingly easy to keep in touch - there are internet cafes all over Nepal and Thailand. And not only that, but there's even a real Starbucks in Phuket!

It's been a year since we released Suneido open source on the web site. A busy, interesting year. I've learnt a lot. (Which is what it's all about if you ask me.) In some ways we've made a lot of progress, in other ways, we haven't accomplished as much as I would have liked. One of these days, if I can find the time, I'd like to write an article about Suneido's first year. C'est la vie.

A long day yesterday - left the house at 6:30am and got home at 8:30pm - 14 hours. But, hey, when your wife phones and says she won't be home till late, it's a perfect opportunity to get in a few more hours.

I was on a roll anyway. I integrated the new bitmapped memory manager (allocator and garbage collector) and it worked! I've had this finished for a while but I've been too chicken to drop it in. It underlies everything and bugs at this level can be both catastrophic and elusive. Just what you don't need. But it seems solid. Perhaps the XP (extreme programming) techniques (simple incremental development and automated tests) really work. And the performance seems at least as good as the old stuff, and probably better. It should be much more efficient space- wise. Minimum allocation is 8 bytes instead of 16. (And there are a surprising number of these tiny objects - e.g. after startup, about 2000 out of 7000 objects are 8 bytes or less!) And there is *no* overhead on the blocks themselves (i.e. no "header" or "trailer"). The only space overhead is the bitmaps. Each bit in the bitmaps controls 8 bytes of heap space, so each bitmap is 1/64 or roughly 1.5% overhead. There are three bitmaps (one to mark block boundaries, one to use for mark/sweep, and one to designate non-pointer blocks) so the overhead is about 4.5%. Most memory managers have an overhead of 4 or 8 bytes per block. On small blocks (the majority in a system like Suneido) this can amount to as much as 25% space overhead. The old memory manager also used a much smaller set of block sizes, so there was more wastage from having to use larger blocks than strictly necessary. And it also kept separate memory pages for each size, so there was also wastage per size. The new allocator is not page based, it simply allocates consecutive memory locations (very fast) and by its nature, the bitmapped garbage collection automatically coalesces all the free space, greatly reducing fragmentation. All in all it seems like a great approach. (You might think that these days, with memory plentiful, space efficiency is not important. But because of caching and virtual memory, it can have a direct effect on speed.)

The new memory manager also supports "non-pointer" allocations. i.e. when you alloc something like a string, that you know will never contain pointers to other heap memory, you can tell this to the memory manager and then it doesn't need to scan these blocks during garbage collection. I made a few changes in the code so strings, numbers, dates, and compiled code are non-pointer. This resulted in about half of the heap being non-pointer, thereby cutting the garbage collection scanning in half.

"finalization" is also supported. You can register a pointer with the memory manager and when the memory it points to is garbage collected (i.e. no more references to it) instead of being free'd it is added to a queue. Then I added an abstract SuFinalize base class (derived from SuValue) that registers instances when they are created. It has a virtual "finalize" method that derived concrete classes define to release their resources. Then in Suneido's main message loop, if there are no messages waiting it removes values from the queue and calls finalize on them. I modified SuFile to derive from SuFinalize, and renamed it's close method to finalize. Voila, files are now automatically closed if you forget to do it. Of course, there's no guarantee on the timing of finalization. Reference counting can detect unreferenced objects immediately and predictably. But conservative garbage collection is not quite so predictable. Often, spurious pointers to objects will keep them "alive" for some time after they are actually "dead". But at least you have some assurance that resources will not "leak" indefinitely. Now I have to modify Suneido's other "resource holding" values to use this facility - e.g. Transaction, Cursor, Image. Windows "handles" will take a little more work because currently they're just treated as integers. I'll have to define a Handle type and change the dll definitions. But this will be an improvement anyway as it will provide some type safety (i.e. stop you from passing any old number as a handle).

Had a great bike ride to work this morning - it was cold, windy, cloudy, and snowing - brisk, you might say. To understand why I would call that "great", you need to know that I leave in three weeks to lead a mountain climbing expedition to an 8000m mountain in Tibet called Shishapangma. So besides the physical training, I'm working on mentally training myself to enjoy "interesting" conditions :-) I've been so wrapped up in Suneido that I'm feeling a little "separation anxiety". But I need the break and I'm looking forward to a simpler, more physical challenge, with a well defined goal and a clear definition of "success".

Anyway, time to get to work - lots to do!

Hey! I picked up the new 25th Anniversary of Dr. Dobbs journal today and was flipping through it while waiting for a compile (I need a faster computer!) One of the columns I always read is Michael Swaine's Programming Paradigms. Lo and behold he devotes a paragraph to Suneido! I had sent him an email about it, but I'm sure he gets a ton of email and I didn't really expect anything to come of it. He doesn't really give any opinion of it, positive or negative, but the fact that he mentioned it seems positive in itself. A mention of Suneido on the Dr. Dobbs website gave us a lot of traffic, so hopefully the magazine will be even better.

It would be very useful in Suneido to have a way to automatically release resources such as Windows "handles" and open files. Suneido's garbage collection will automatically release the memory but it doesn't know anything about resources. Currently, programs must explicitly release their resources when they're finished with them e.g. close files. In user interface code this is often done in response to the WM_DESTROY message. But in other places e.g. report formats, there is no obvious place to do this. It can be especially hard to ensure that resources are released in the event of exceptions (errors). It's really the same kind of problem that garbage collection solves for releasing memory.

Initially, I thought the way to do this would be to support "finalization" methods in Suneido. But after looking into it further, I think a better approach is to add "weak references" to Suneido, and to use them to release resources. My current feeling is that this is something that should be added to Suneido in the near future.

The garbage collector calls finalization methods after determining an object is no longer referenced, but prior to free'ing it. Finalization methods can release resources such as file handles that are used by the object. Java has finalization. It seems like a simple, obvious solution. However, there are a number of subtle problems with finalization. One problem is that the order of finalization is not predictable. Another problem is that it is possible for finalization methods to "resurrect" an object by storing a reference to it. This in turn "resurrects" any objects that this one references. But these other objects may already have been finalized! There are ways to work around these issues, but obviously it's not as simple as it might seem.

"Weak references" are another memory management facility. A weak reference stores a reference so a memory object in a way that doesn't stop the object from being free'd by garbage collection. At the end of garbage collection, any weak references that refer to objects that are now free'd are cleared. This might work as follows:

wr = WeakReference(value)
wr.Get() => value // value is still "live"
// some time later
wr.Get() => False // value has been free'd

Weak references are a more primitive mechanism; they are simpler to implement and do not have the same problems as finalization.

The key idea is that you can use weak references to implement a scheme to automatically release resources. The basic method is:

  • When you allocate a resource, you add it to a table.

  • The table contains an entry for each resource, including a weak reference to the object using the resource, and information about how to release the resource (e.g. the handle value and a "release" function)

  • Periodically (possibly by a background thread) the table is scanned for entries whose weak reference has been cleared. Using the information in the entry, the resource is released, and the entry is then removed from the table.

One enhancement is to have the garbage collector add cleared weak references to a queue. You can then eliminate the linear scan of the resource table by keying it on the weak reference.

A form of weak references can also be used to reference memory objects that can be free'd if memory space is required, for example, a cached copy of an external file. Java calls these "soft references". The garbage collector treats these references slightly differently in that it only free's the objects if memory space is running low.

For more information on weak references in Java see: Java finalization

30 Nov 2000 (updated 30 Nov 2000 at 20:32 UTC) »

I've been working on documentation for the last little while and I needed a break and some real coding. So I decided to make semicolons optional in the Suneido language. I've been toying with this idea for a while and over my coffee at Tim Hortons one morning, I worked out how to do it and it seemed pretty straightforward. In the end, it took me about half a day to get it 99% working, and another day and a half, and several complete rewrites, to get the last 1%. Typical. On the positive side, the current version is a lot cleaner than the initial version.

One thing I was afraid of was that the changes would ``break'' a bunch of the existing code in stdlib. So I wrote a quick function to syntax check all the records in stdlib.

    {|x| try x.text.Eval(); 
    catch (e) Print(x.name " - " e); }
Then I put this in a text file so I could run it even if I there were too many errors to get to the WorkSpace. (With the Print changed to output to a textfile.)

My first stab at the changes resulted in about 500 records with syntax errors. After fixing a few blatant issues, it was down to about 30. A few more fixes and voila, no errors. But that was on code that had semicolons on every statement. When I started to test examples without semicolons I ran into a bunch more problems that took quite a bit longer to fix.

I tried to follow a good refactoring approach, although, technically this wasn't refactoring because I was changing functionality. But I was preserving all the existing functionality. The basic plan was

1. Change the scanner to return a NEWLINE token instead of a WHITESPACE token for any run of whitespace that contained a linefeed or return. Then change the parser to ignore NEWLINE tokens.

This should not have changed anything - and it didn't. All my tests ran, and no syntax errors were introduced.

2. Add a variable to track nesting of (), [], and {} and ignore NEWLINE tokens if inside one of these. Also skip NEWLINE's after binary, and trinary operators.

Seems simple but this ended up taking quite a bit of fiddling to get right. At first, I was adjusting the nesting counter in various parsing methods. (Suneido uses a recursive descent parser so there is a method for each grammar construct.) But this got pretty ugly, so I ended up counting (), [], and {} in the method (match) that reads new tokens from the scanner. After this breakthrough (which, of course, seems obvious now) it went fairly smoothly.

When I released this version internally, we found two records in stdlib and a few more in other libraries, where there were no syntax errors, but the new interpretation was different from the old interpretation. For example:

    ... ;
used to be one statement, but now was two i.e. return ; ... ;

Another case was:

s = s
Which was now three statements instead of one. The solution is to put the operators at the end of the lines instead of at the beginning, which is our normal style guideline anyway.
s = s.
Overall, I'm pretty happy with this change. Personally, I'm so used to having to have semicolons in C and C++ that it's not really an issue for me. But I have noticed it can be a problem for beginners. And if you don't need them, why require them.

The next step on this path is to make braces optional and use indenting instead, like Python. One of these days when I need another fix of serious hacking...

Andrew McKinlay
Suneido Software

Just added blocks to Suneido. This is a Smalltalk idea. Ruby also has blocks although in a little different way from Smalltalk.

Basically, a block is a chunk of code within a function, that can be called like a function, but that operates within the context of the function call that created it (i.e. shares its local variables). But the cool part is that a block can outlive the function call that created it, and when it does so, it keeps the context (set of local variables) alive. For example:

make_counter = function (next)
    { return { next++; }; };
counter = make_counter(10);
    =>  10
make_counter returns a block. The block returns next++.

At first, it seemed like a tough thing to add. One of the big issues is that function call contexts (Frame's) are kept on a stack because this is much faster than on the heap. But that won't work for blocks that outlive their function call. Smalltalk handles this by allocating contexts on the heap, but this is much slower than a stack. Then I remembered a discussion of this in ``The Design and Evolution of a High Performance System'' by David Unger. The idea was to use a stack for contexts, but to catch the cases where a block outlived its creator, and to move the context to the heap. This is what I did for Suneido. There were only two cases I had to handle - where the block was returned from the function, and where the block was assigned to the member of an object. As far as I can tell, those are the only ways for a value to survive a function invocation. It took about a day to add this, with a little refactoring of existing code, and of course, a test for it.

So now Suneido has blocks. Of course, nothing uses them yet, but I have some ideas.

7 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!