Older blog entries for wli (starting at number 7)

Looks like openlogging.org fell off the net and I can't bk commit anything. Which is somewhat painful as I'd like to avoid batching unrelated things together in a given changeset.

The bugreporter seems to have indicated that the rmap13 bug was created by an independent patch used in combination with it. What a relief!

A number of strategies seem to have surfaced for dealing with kva exhaustion:

  1. making kernel/user address spaces disjoint
  2. dynamically mapping large data structures
  3. reserving a region of per-process kva for windowing potentially large things predominantly accessed from the context of its creator
  4. shrinking the size of various data structures
  5. shrinking caches more aggressively
  6. reserving a larger global windowing region
  7. reserving per-cpu windowing regions with scheduler support
  8. daemons parked in front of large structures stealing the user portion of the address space for windowing their oversized data structure
  9. statically reserving per-process kva for dynamically mapping things like pagetables with strong process affinity
  10. doing nothing whatsoever and using 64-bit hardware instead (supposedly the preferred course of action which is not really acceptable to those I'm helping)
Highmem is really evil.

Making fork() not copy pte's for file-backed vma's seems to have some very difficult to trace issues. As best as I can tell things somehow end up faulting in garbage.

Poking around fault paths and such has led me to consider beautifying it somewhat by fleshing out the segment driver -like approach and some pure non-semantic beautification of rbtree manipulation code. I might also try using a different kind of tree if I feel like going in for the long haul. Not that I haven't done that before.

pagemap_lru_lock things are going well. Just taking it slowly from here so bugs I may have missed don't show up in larger groups than can be effectively handled, and not feeding stuff to riel until after the issues from the last batch of things sent to him have been handled. There's an annoying one where the bugreporter has vaporized and I can't get anything close to useful info as to what happened that I'd like to get a grip on, but short of literally flying out to meet the guy, sitting on his doorstep until he reappears, and borrowing his box to debug on (which isn't going to happen) I'm not sure what can really be done.

It looks like rmap is getting close to (or at) parity on NUMA-Q now with a rollup of the pending changes so I'm slowing that down and keeping things stable. Now I'm helping to chase highmem stability issues and have some fork() efficiency issues on at a lower priority. Highmem is evil. Very evil. It's going to take a bunch of us grinding away at it full-time to make this stuff work. The niceness of direct-mapping the kernel virtual address space turns into a kva-exhaustion horror beyond imagination as dynamic:direct ratios go up. There will be much pain.

Trying to debug the races with the pagemap_lru_lock breakup all week. Mostly just singlestepping and trying to debug the simulator. Nothing to see here, move on.

So nothing really got done this weekend. It wouldn't have helped to run the benchmarks without having the analysis code I wanted to use the benchmarks as testcases for ready for the occasion. Reimplementing math libraries there aren't free equivalents of is a big PITA.

I got a real profile instead of a description and the signs point to too many calls to add_timer(), mod_timer(), and del_timer() as opposed to cache-blowing in cascade_timers(), which surprised me but relieves me of the burden of writing the umpteenth priority queue. It also appears to be specific to ip_conntrack which I'm not sure is one of my priorities.

Following the yellow brick profile...

mbligh managed to get some testing in on the pte_chain_freelist racefix, and it appears to survive booting and running some benchmarks. Per-zone freelists should now follow after poking around for further races in the rest of this round of auditing.

Looks like the rest of today will be spent on entertainment types of things. Maybe some code will come out late tonight.

Found a race in an audit of one of the pagemap_lru_lock breakups that appears to be common to all of them, but it's unclear whether it's the only one left. After the pagemap_lru_lock was broken up the pte_chain_freelist, which is global, was left naked. Apparently after I survived that one then one of the init_idle() races came out and I ran out of time on the machine.

Discovered in some additional testing that the incomplete gamma function for the chi^2 CDF had convergence problems when either a or x > 7, so it appears that will need the continued fraction expansion for that domain. The Kolmogorov-Smirnov CDF code is spewing complete garbage, and the other CDF's are on the back burner.

The queue is propagating things upward and downward and finding the right levels to put things at, but the stratified trees that are supposed to be what gets bubbled around still need to get plugged in. There appear to be some bugs with respect to using the right chain field at the right time with them. Basically when it's time to bubble the things down from one level to another, or when things need to be chained off each other by insertion in a tree of a deeper nesting level, something is getting the nesting level of the tree node wrong.

Found two more classes of bugs in another audit of the waitqueue code. One is that the leader against which other waiters on an object are chained is not actually considered an element of the list by the list_head routines, but rather only a sentinel, which caused the reference to it to be lost, and the other is that some of the code assuming it had a unique reference to the queue wasn't actually removing it from the comb list.

Some more people I've never heard of chimed in on the hashing thread and put in a few words to perpetuate the confusion between "random" and "uniformly distributed". This is never going to end. I didn't actually bother answering the post because I'm going to hold out until I have some hashtable analysis code others can use to reproduce my results and speak in numbers. As long as it's rooted in terminology and anecdotes no one will ever admit what's going on.

Minimal progress on the stratified trees but some pointy haired issues came up that distracted me for a while.

Reviewed some small changes from Sam Ortiz that look pretty as far as getting SGI's discontigmem stuff to play happily in combination with removing ->virtual. It's not clear whether it will perform well as it apparently takes some doing (in terms of CPU cycles, the code is not that bad) to remap mem_map array indices to page frame number offsets. Hopefully that won't be too bad, but if it is, ->virtual is #ifdef'd and can be brought back by that method.

Tried to take a harder look at the discontigmem thing itself but there's quite a bit there to wade through. I think I'll be waiting for the separated patch so I don't need to guess at which chunk corresponds to which feature it's trying to implement. If it were smaller (which I'm not sure it can be) it'd be easier to get a full picture, but it'll only get smaller by becoming multiple patches.

Finally got a look at Pat's NUMA-Q discontigmem patch and was very impressed, the code was very clean and very readable. I'll have to take a harder look to be sure I've done due diligence with respect to it not breaking other things but it's very nice.

The hashing flamewar apparently degenerated to the name-calling level, though the name-caller does not have a particularly good reputation. I don't care. I'll continue collecting hash table metrics and their measurements from test runs. Sounds like I might be having a benchmark weekend. Again.

Signed up for advogato after some guy posted a lame spelling flame in response to my kerneltrap interview and then slapped a silly thing about it into his diary here. Looks like an interesting way to log the various things I'm doing.

Did some more tweaking for code cleanliness of treap implementation for the bootmem patch and some more testing on segment_tree.h. The main issues are mostly testing and purifying the code to eliminate macros. Page stealing and type tagging are still pretty far off. Bootmem is really a background task.

The waitqueue vs. printk bootstrap ordering issue doesn't appear to be the only one. My suspicion now is an error committed while doing insertions on the comb lists corrupting the structure of the collision chains.

Stratified trees appear to be a bit of work to think out properly but it looks like when the code is boiled down to its essence it will actually be very simple.

Laid out some C code to get the Lyapunov exponents of hash functions and the entropy of the bucket distribution. Also started translating some of the other scripts I wrote to C. It'd be a little easier if there were something to check it against but for the entropy at least there are some analytically calculable results whose input distributions will work as test cases. For the Lyapunov exponent some more reading may be required. I seem to be having a little trouble getting the C version of the Anderson-Darling statistic's CDF to converge too, but that's a bit much for one day. This is pretty easy stuff to do but actually running the tests to get results to analyze the statistics of is so time-consuming it doesn't feel like it will ever get a decent timeslice.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!