Older blog entries for Kalana (starting at number 5)

I think I have another solution to andrei's random elements from a linked list thing.. It's actually pretty similar to Ankh's solution, but you don't need to makeAscendingSequenceOfRandomNumbers, and it has the same restriction that you need to know the number of items, and order doesn't matter. I'm not quite sure mine works though.

When you have a list of n elements and want to pick m, you want each element to have a probablilty m/n of being picked. Walking the list and picking elements where each element's probability of being picked is m/n doesn't work, because you're not guaranteed to end up with m elements.

However, if you pick the first element (with probability m/n), you've changed the problem to picking m-1 elements out of the remaining list of size n. And if you don't pick the first element, you've changed the problem to picking m elements out of the remaining list of size n-1. Pseudocode, recursive:

pickFromList(list L,
             int nWanted,  /* number of elements you want to pick */
             int nItems)  /* size of the list */
  if(nWanted == 0){
  }else if(nWanted == nItems){
    pick every element in L;
    if(drand48() < nWanted/nItems){ /* drand48 gives Uniform(0.0, 1.0) */
      pickFromList(nWanted-1, nItems-1, L->next);
      pickFromList(nWanted, nItems-1, L->next);
Pseudocode, iterative:
pickFromList(list L, int nWanted, int nItems)
  int nPicked, nSeen;
  for(nSeen=0, p = L; p != NULL; p = p->next, nSeen){
    if(drand48() < (nWanted-nPicked)/(nItems-nSeen))
(probably a few off by ones in there)

Handwaving to show that the above works:
pick 2 out of list of 5 elements: a, b, c, d, e

ac = the event that a is chosen
bc = the event that b is chosen
A = probability that a is chosen
B = probability that b is chosen

A = 2/5

B = 1/4 if ac      = 1/4 * 2/5 + 2/4 * 3/5 = 2/20 + 6/20 = 2/5
    2/4 if !ac

P(a chosen   &   b chosen) = P(bc  if ac)  * A     = 1/4 * 2/5 = 2/20
P(a chosen   & ! b chosen) = P(!bc if ac)  * A     = 3/4 * 2/5 = 6/20
P(! a chosen &   b chosen) = P(bc  if !ac) * (1-A) = 2/4 * 3/5 = 6/20
P(! a chosen & ! b chosen) = P(!bc if !ac) * (1-A) = 2/4 * 3/5 = 6/20

C = 0 if   ac  & bc     = 0            = 2/5
    1/3 if ac  & !bc     + 6/20 * 1/3
    1/3 if !ac & bc      + 6/20 * 1/3
    2/3 if !ac & !bc     + 6/20 * 2/3

Here's my diary entry for May. Time between entries is exponentially increasing.

Oh, if you're annoyed at netscape/mozilla/whatever text input things, lynx allows you to invoke an external editor for textareas. I know it has security holes, but I know of no other text browsers with that feature (or that allow turning off table and frame rendering. sure, it's fine for some sites, but others' table-layout gets mangled beyond recognition, and it would be nice to actually be able to read the content). Maybe someday I'll take lynx, links, or w3m and add/remove stuff until I get my ideal web browser..

On the Worldforge front, the Atlas protocol library people have apparently been doing "cool" stuff, so I'm holding back on client work until that stabilizies. In the meantime, I've written a nntp news reading cgi (haven't done threads yet. not exactly sure how threaading works, actually.) and did a uml diagram which I'm sure would be a source of amusement and/or confusion to someone who actually knows about uml than what can be learned from the tooltips in dia. I've also been working on a paper about character communities and implicit knowledge sharing.. might end up using mod_virgule to test out some of my ideas.

Ok, that's all. The machine I'm typing this on is promising to reboot in 5 minutes.

Yay, another diary entry. Actually, I consider averaging once a week is good enough for me.

I agree with mbp, sometimes the diaries seem more like yelling into a huge room. Of course, you can't see into it, and the only reason to think anyone's listening is that occasionally, someone shouts back. Interesting how people use their diaries instead of the article comments to discuss articles, too.

Maybe if advogato had sections for articles, like "Technical", "Philosophy", or "Social", there would be less complaining and anti-certifying about off-topicness.

While on the topic of meta-osfs[1], I think it would be interesting to document how specific large open projects (more than a few dozen dedicated developers/contributors) work in terms of coordination and project management. This kind of thing seems to be reinvented by the individual projects. While, obviously, something that works for one project doesn't necessarily work for another, sharing insights and experiences might be helpful, and just interesting in its own right for the armchair anthropologists. I suppose everyone who knows is too busy working on their respective projects, though.

Does anyone actually read the TOS for Sourceforge? IANAL, but some of the stuff looks a little strange, like 6a, which covers "any Content" that is "harmful", "vulgar", "abusive", or "otherwise objectionable". Does stuff like "Hey, you suck", or "This code is a POS" count? And 6i, which states that you can't "cause a screen to 'scroll' faster than other users of the Service are able to type". When debugging stuff on IRC, I often paste a few screenfuls of logs or compiler output, definitely "faster than other users of the Service are able to type". Am I allowed to do that? Actually, I'd say any heated discussion in realtime with more than a few participants goes faster than I can type (in that I wouldn't be able to log the conversation by typing it. It doesn't seem terribly well defined. (does "faster than other users are able to type have a legal definition?)). How about 6h? I can't "transmit any material that contains ... files or programs designed to interrupt destroy or limit the functionality of any computer software...". When doing software devlopment, there are often legitimate reasons to do so, such as posting "Hey, I crashed $PROGRAM, here's how", or "I saw an exploit for $PROGRAM on bugtraq, we should fix it". Note that this isn't even restricted to public forums, so you technically can't even do this on, say, a private developer mailing list. I hope I'm just being unreasonably paranoid.

[1] Open Source and/or Free Software, because I don't want to make a distinction between them in contexts where there is none.


Web site's finally back up today. I actually had a few people email me about it over the weekend, which hadn't happened before (I suppose they just read the website..). I never knew people outside the project had actually heard about us. I suppose my indiscriminate meme scattering is working <Laugh Role="diabolical">

We've decided to use DocBook for our whitepapers and the Circe ruleset, so I've been trying to learn it, mostly from examples. I've pretty much got the basics down, so now I'm trying to figure out how to get the rest of the project to learn how to write it. Docbook: TDG seems a bit too deep for those unfamiliar with SGML, even hackers, and Get Going With DocBook: Notes for Hackers is great, but the sections containing only the word "dude" detract from it just a bit. I looked at the LDP's Using DocBook HOWTO, and didn't particularly like it either. I took a look at the OSWG's PID project, which looks like just what I need, but it's not done yet (and doesn't look like it's very active). There has got to be a good introduction to writing DocBook for the intelligent SGML-newbie. Hasn't there?


Looked at the site in color for the first time (since lynx is just better when one actually wants to write something of any length. btw, anyone know if mozilla will allow spawning an external editor for textareas? I'd do it myself, but mozilla seems too... huge to get into for something so small). Nifty. One interesting thing I just noticed is that the comments and articles are mostly posted by Journeyers, wheras the diary entries are evenly split among Master, Journeyer, and Apprentice. I suppose the Apprentices are somewhat intimidated. Not sure if that's good, bad, or none of the above.

It'd be interesting if advogato's diary engine allowed replies to entries. Seems like it would fit into the whole theme of "community site for free software developers". Of course, a diary is different from the main articles page, so the owner of the diary should have more control over it. Perhaps the ability to delete postings, or to restrict them to a set of trusted accounts (maybe adapt the site wide trust metric to a personal trust metric by using yourself as the seed instead of the seed accounts).

Enough rambling about that. But life sucks anyway, and I don't feel like elaborating on that in public.

I discovered advogato today (or possibly yesterday, depending on the time zone it uses). Looks great!
On other subjects, I've been working on porting the song Supercalifragilisticexpialidocious to UNIX, ie. by changing the choruses to things like: "sudo cal egrep vi regexp ps chmod fsck". Of course, every one of the 8 occurances of Supercal... will be replaced by a different string of commands, so this is just a bit non-trivial. OTOH, it would be pretty cool if each of those lines was an actual command. But the only way I can see that working is if literally every other word is |. This seems like a pretty obvious idea, but I haven't found any references to it. If you've seen this done before, please tell me (does advogato have my email on my page? I guess I'll put it in my Notes secion if not).

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!