Whew, what a weekend. Parties, poking some people's bellies with chopsticks (you read that right), our dog got sick, etc, etc... My head is spinning.
Threads, state machines and I/O
mbp, could you confirm the source for that quote from Ulrich? I tried googling it, to no avail... Thanks!
By the way, I think you are right on about explcit vs implicit sharing. As far as "performance hack" goes, threads are faring very poorly, breaking cache lines on a single processor and increasing inter-processor chatter on SMP. I doubt there is any case of such a silly OS that would also support NUMA, but having a thread migrated to a "far" CPU would probably be ultra-painful.
One of the problems is that we don't have asynchronous I/O on (normal) Unix. We only have synchronous, with non-blocking on some cases (support for file-based file descriptors is conscupisously missing). You can go some distance with that, a bit more if you play clever tricks, but asynchronous I/O is wanted. Thanks to bcrl, this is coming in the next major Linux release. Yay!
lukeg: Like Ingo Molnar, I think it comes from Win32, that has really poor performance caracteristics for starting processes. See his answer to question #6 of this interview on Slashdot. Ideas like threads for GUI applications that are not CPU intensive are in my opinion so stupid, as you and Ousterhout (through async) point out. In such applications, you always want to do something in reaction to an event, even if it is only the passing of time!
MichaelCrawford: a single CPU is a state machine, two CPUs are two state machines, therefore, to use two CPUs, you have to run two state machines concurrently. Basically, the optimal setup is one "execution context" (be them threads or good old processes) per CPU.
Some people might point out Intel's Hyperthreading as a case where multiple threads per CPU might make sense, but this is false. There is a lot of shared resources involved in this hyperthreading feature (there isn't the double of each resources like execution units and floating point units), and in particular, there is one shared cache for all the threads on a particular CPU (so you have multiple threads with possibly very different code and data battling for cache space, aka "trashing"). Hyperthreading, IMHO, makes running multiple threads per CPU better than without that features, but not better than having a single efficient thread per CPU (it's a kind of workaround).
The difference between simply doing a multithreaded program and having multiple independent state machines are multiple. Don't share all of the virtual memory space, use explicit messages to cause state changes (through IPC). Every time that you write to a shared page on an SMP system, the processor doing the writing sends a page invalidation to the other CPUs caches. If you use implicit sharing (threads), you never know, even if you take great care not to share data structure between the threads, you might have a counter variable that you update really often in the same page as that data structure that is used in another thread (on another CPU), causing no end of inter-processor traffic and killing the cache. With explicit sharing (processes with shared memory, for example), this doesn't happen by accident, only if you really update a page that has to be then invalided on the other CPU.
So using multiple processes (rather than threads, and still only one per CPU) and using message passing (can be optimized using shared memory, if pipes or Unix domain sockets are too slow for your taste) is best.
Ah, so many things I'd like to do to and with XPLC. Talking about the I/O, state machine and multithreaded stuff kept reminding me. I'm still working on it, but more occasionally (because I'm on a project with tight deadlines at work), but I'll be back. The good thing with projects that have tight deadlines is that you (roughly) know when you'll be done. ;-)
And I'll be back with a vengeance! Seriously, I've got some pretty cool stuff going with XPLC, I just have to do the grunt work of coding, since I have been spending a lot of time designing and thinking about how to do things in little pieces of spare time, like when I ride the subway.
XPLC is so neat, there just won't be any excuse not to use it!