Sometimes the little things can make a huge difference in the face of scalability. Changed a hashtable to store 64-bit checksums instead of arbitrary strings as a key value (using the first 32-bits as a hash code). It may seem a bit silly, but it makes a huge difference when there are 4 million entries of >50 chars. I feel dumb for not spotting it earlier. I know there is a chance of false collision with checksums, but it's extremely low with 64-bit ones. Especially since the data is split among 800 hash tables of <10,000 entries. Also, moving logging to different disk spindles than swap can be important, especially if your process consumes a fair amount of virtual memory. I'm sure some people are wondering what this program is.
I am now reading Extensibility's Schema Adjunct Framework. It's an interesting proposal for including extra information alongside XML schemas (which aren't very extensible) that can be used by programs. This is an intriguing concept that might prove rather useful for another project here at work.