Older blog entries for kelly (starting at number 91)

Citizendium plagiarizes?

It seems that Citizendium, despite its fancy claims, is not above plagiarism: compare this image on Citizendium with this image at Commons.  Note the stunning lack at Citizendium of any credit to Magnus as the source.  Not only is this copyright infringement, but it's morally dishonest for Citizendium and its contributors to take credit for work they did not do.

Shame on you, Larry.  I would have expected better of you.

Syndicated 2007-03-25 21:03:16 (Updated 2007-03-25 21:01:50) from Nonbovine Ruminations

The demographics of Wikipedia

Reading Geoff Burling's recent post on "Age and Wikipedia" got me thinking about age, and about other demographics, in the Wikipedia communities.  However, I think the behavior Geoff is writing about is not entirely a symptom of age (although certainly many of the people expressing the behavior in question are teenaged boys, as I've commented on before), but rather of psychological characteristics that Wikipedia selects for, combined with the simple fact that there are a lot of teenagers (boys, mainly) with scads of free time to burn on the Internet.

A few days later, I then saw this, from Language Log, about a college freshman who credits Wikipedia for his passion for linguistics.  And certainly Wikipedia does probably attract significant numbers of people who first discover Wikipedia while scratching some knowledge itch.  I seem to recall that my first encounter with Wikipedia was due to an interest in mathematics.  Of course, not nearly everyone who finds Wikipedia as a reference source goes on to be even so much as a casual editor, let alone a dedicated editor, and I doubt that anyone has anything better than a wild guess as to the conversion rates there. 

And this brings me to the major annoyance in discussing demographics of Wikipedians: there are no meaningful demographics about Wikipedians.  About the only subgroup of Wikipedians for which there is even a hope of meaningful demographics is that group which goes to Wikimania.  The culture of anonymity there is so strong that many admins, and quite probably a majority of editors, do not reveal even basic demographic information about themselves.  There are a number of voluntary surveys (e.g. the "list of Wikimedians by age" on meta), but any statistician knows that self-selected surveys are problematic at best and the response rate on these surveys is generally so low as to be useless for any meaningful purpose.  Even so much as estimating the number of distinct editors on Wikimedia projects is hard, because of anonymous editing (which results in multiple people being difficult to distinguish) and sockpuppets (which results in a single person appearing to be multiple people).  The English Wikipedia has millions of users, but a rather large percentage of them were created for the sole purpose of vandalism.  The number of true, non-anonymous editors is simply not known

So, while it is "common knowledge" that "Wikipedia is run by high schoolers", there really is not any objective basis for this statement.  At best, it's an intuitive guess extrapolated from very limited information.  Certainly there are high school students involved with Wikipedia, but I think the above-linked article about the passionate linguist is proof of why this can be a good thing.  Extrapolating from a few instances to the general case, however, is fallacious.  I would love to see real, meaningful statistical data on the demographics of Wikipedia readers, contributors, and community members instead of the current mishmash of wild guesses, extrapolations, and outright hyperbole that is sadly passing for fact in such discussions.

Syndicated 2007-03-25 18:04:39 (Updated 2007-03-25 18:03:28) from Nonbovine Ruminations

On moving

I do not enjoy moving.  It is a very timeconsuming process.  It is a process that does not leave one time to write interesting things in one's blog.

Syndicated 2007-03-23 13:51:29 (Updated 2007-03-23 13:49:55) from Nonbovine Ruminations

Notability, maintainability, and quality

Sage Ross reports (in his blog, at "Wikipedia and Notability") that the community is unhappy with the current definition of notability. I've touched on notability before, in the limited context of webcomics (see Webcomics and Wikipedia and On Webcomics, again). As Sage notes, "notability" has always been a contentious issue in Wikipedia, and there is indeed currently a dispute over what, if anything, "notability" should mean.

In the interest of disclosure, I will reveal that I am an eventualist inclusionist mergist. (I do not consider the latter two mutally contradictory.) My experience, in the somewhat over two years that I've been involved with Wikipedia, is that the scope of what constitutes "acceptable content" for inclusion in the encyclopedia has consistently broadened over time, although certainly in some areas (such as webcomics) there have been pushbacks. A good example of this trend must necessarily be high schools. When I first started at Wikipedia, in late 2004, very few high schools had articles, and most attempts to create one were met with a rather quick deletion on the basis of being "not notable". By 2006, it was generally accepted that high school articles were not subject to being deleted on the basis that they were "insufficiently notable", and today nobody (except for the most hardcore deletionist) contemplates deleting a high school article for very long. Similar trends have seen individual articles on every Pokemon, articles on individual episodes of various television shows, and all sorts of other content that would likely have been summarily deleted in 2004 become generally accepted as appropriate content in 2007.

This is, in my belief, largely due to the fact that the people who feel the urge to remove what they feel is meritless content are simply outnumbered by the people who would create such content. There has not, in most cases, been any conscious decision by the Wikipedia community (if in fact that entity is capable of making decisions, which I rather highly doubt) that articles on individual episodes of the Simpsons are appropriate for inclusion; rather, the articles were created by dedicated Simpsons fans, and nobody with an eye for trimming the encyclopedia got to them quickly enough to effectively resist their presence, and so they, by default, became part of the accepted corpus. I see no reason why this trend would not continue, and so I therefore expect that over time the margins of notability will continue to be pushed further and further back. I don't think that the margins will ever be pushed out completely to the point that (e.g.) the serial number of the dollar bills in my purse will merit their own articles (although it's not entirely out of the question, as many of them are catalogued already at wheresgeorge.com), but I think there's still a great deal of room for expansion and I expect to see Wikipedia expand into that space over the long haul.

The ongoing battle over webcomics seems to be the current exception to this trend, and I don't expect it to continue. Assuming that they don't give up, the webcomics fans will eventually win, as they simply outnumber the notability pruners. At the moment, the pruners are organized against webcomics, and they are assiduously defending that territory. However, the pruners are more subject to attrition in the ranks than the webcomics fans, and it is likely inevitable that too many of their faction will leave Wikipedia or be drawn off into some other battle (say, amateur sports leagues, or radio towers, or some other equally borderline area) and the resulting loss of active focus will let the webcomics fans win out. It's far easier, in most cases, to recruit people in favor of keeping content than it is to recruit those opposed to it.

So, rather than spending a lot of time refining the definition of notability, I would advise discarding it entirely. Notability is, in practice, is a proxy for a large number of largely personal beliefs about what should be in an encyclopedia for which there is no consensus within the Wikipedia community. Furthermore, those beliefs shift over time, and I believe that shift will tend toward broader inclusion over time. The problem with broad inclusionism is that it will inevitably lead to more articles than the Wikipedia community can effectively maintain. (It is difficult to deny that this has already happened.)

The problem with discarding notability is that immediately people will scream "But then we will have articles about what you had for breakfast yesterday". Well, no, we won't. (Although it might be interesting to have that data; I'm sure that there will be people in 2150 who will be interested in knowing about the dietary habits of early 21st century IT professionals. There are probably people in 2007 with that interest, for that matter.) I am not advocating having no standards at all; that would be irrational. Instead, the standards must reflect maintainability as the main consideration. A record of my breakfast yesterday (for the record, two glazed Dunkin Donuts and a bottle of Aquafina) is unverifiable, and thus unmaintainable, and thus unfit for inclusion in Wikipedia. Verifiability isn't enough for maintainability, but it's definitely a minimum characteristic.

This seems to be the general direction of the discussion that Sage refers to, although they're not characterizing it as maintainability, but instead attributability. I don't think attributability is enough. One of Wikipedia's largest problems right now is that it's larger than its community can effectively tend to. Wikipedia needs to aggressively limit its growth, at least in the short term, to give its community enough time to structure itself better to be able to handle the content it has now, to say nothing of the content it will acquire in the future. The problem that adopting attributability (or verifiability) as a minimum criterion for inclusion is that someone is going to have to check the cited sources for accuracy. Nobody is doing that now, except on a haphazard basis. Wikipedia has no process now for any sort of organized maintenance of the encyclopedia; even vandalism management is done haphazardly.

Quite frankly, I think it would be appropriate for Wikipedia to disable new page creation (except for admins, to deal with special cases) for an entire month and spend that month developing the infrastructure to better maintain both the articles it currently has and the new articles it'll gain once new page creation is reenabled. New page review needs to be systematic, not haphazard, and there need to be systems to ensure that every new page is looked at by at least one and preferably several experienced editors promptly after creation, both to properly categorize it (the stub sorters already sorta do this, but they do so in a far less useful way than they could) and to evaluate the article for what action the community needs to take with respect to it. And then the community needs to actually do those things.

There are currently 21,598 articles tagged as needing cleanup and 55,928 tagged as lacking sources. And I suspect that only represents about 20% of the articles that actually belong in those respective categories. These numbers are not falling with time; they are growing (a month ago, there were only 49,607 tagged as lacking sources). These backlogs reflect the rapidly declining overall quality of Wikipedia. The situation may already be out of control; if it is not yet, it likely will be soon. The problem is that the community largely seems not to care, and that really bothers me.

Deleting all unverified articles would be a good start. Not all at once, but a deliberate, systematic process to either source or delete those 55,928 articles would be a great start. Proper use of automation is critical to this, and I really think that's where Wikipedia needs to be concentrating its activities in the next year. It would be great if the Foundation would help to recruit the volunteers needed for this effort; the problem with the current community is that there don't seem to be enough people interested in this sort of work to get it done.

Syndicated 2007-03-15 22:01:00 (Updated 2007-03-16 00:16:51) from Nonbovine Ruminations

Cook County bans smoking

Cook County's public smoking ban went into effect today. This means that any place in Cook County not already subject to a community ordinance regarding smoking (basically, unincorporated Cook County and any city, village, or twon which hasn't bothered to pass a more or less restrictive ban yet) is now subject to the county's rather broad public smoking ban, which includes both bars and restaurants.

The Cook County ordinance is similar to the Chicago one, except that it does not have Chicago's "air scrubber" exception for bars: all bars are now nonsmoking. This means that Possum's Pub, a little hole in the wall bar a block from our house, might actually be appealing: it's in unincorporated Cook County, which means that they will be nonsmoking and there's no village council that can change that short of annexing the property, which will not likely happen.

Too bad the setback is only 15 feet. I'd have much preferred 25.

Syndicated 2007-03-15 14:47:47 (Updated 2007-03-15 14:45:56) from Nonbovine Ruminations

Do your bits have Colour?

Thanks to Kat to pointing out this rather interesting attempt to explain legal relations to computer geeks. I like the Paranoia reference.

Of course, he's not entirely right: Copyright law doesn't deal with things as small as "bits". Copyright law doesn't care about the details of the representation; as far as the law is concerned, files generated by Monolith aren't distinguishable at all from the original content; they're just encoded in a funny way. And encoding doesn't matter. So, really, it's not about whether your bits have the "copyright Colour" (which, according to this guy, only exists for lawyers), but really rather bits exist at all. The "I can't distinguish this file from random noise" is simply false: you can distinguish it from random noise by decoding it. If it were random noise, it would still be random noise after decoding, but it's not.

It is a truism that computer geeks routinely babble nonsense about copyright, usually because they use analysis techniques that are based on strict logicalism, instead of the goal-oriented legal reasoning methods that lawyers are taught and routinely use. In law, if a strict application of some rule leads to an absurd result, you ignore the result, and find some other rule to apply in that situation that doesn't lead to an absurd result. The law doesn't like absurd results (they're called "antinomies" in some older texts); judges tend to scoff at the idea that they should require something absurd and will generally find a way out if given the chance.

He does, however, get it basically right: "Colour" is not a characteristic of bit sequences; it's a characteristic of processes and of history. Ultimately, law isn't about things. It's about people, and more specifically the relationships between them. Nothing can be evaluated in a legal sense without knowing the context and history in which the evaluation is required. And that's a large part of why computers cannot, on their own, enforce laws: there is always the possibility of a context or a history which the designers of the "enforcement system" didn't envision. People have the flexibility to deal with that and move on; computers, not so much.

Syndicated 2007-03-15 03:03:24 (Updated 2007-03-15 03:01:35) from Nonbovine Ruminations

Is Wikimedia really committed to open source?

One of the principles that I was under the impression that Wikimedia follows is a commitment to use open source software whenever possible. As an organization committed to open content, the Foundation supposedly strives to use only open source software in its main production operations (and, exclusive of some router firmware, it does as far as I know). In addition, the Foundation is supposed to minimize its use of proprietary software for accessory uses, using open source options when available.

So, then, can someone tell me why Wikimedia has set up a Ventrilo server? There's nothing Ventrilo can do that Asterix can't. Furthermore Ventrilo only supports Windows clients, and charges a monthly licensing fee; Asterix supports any SIP client (of which there are dozens) and (being open source) has no fees at all. On top of that, Ventrilo is merely a voice chat server; Asterix is a full-blown telephony application, with voice mail, ACD, IVR, and basically everything you'd want in a regular PBX system, plus the ability to set up softphones (or, with minor hardware investment, hardphones) that work from anywhere.

I've been trying to convince the Foundation to set up Asterix for over a year now, if for no other purpose than for the occasional conference call, and to make it possible for Danny to forward calls to Florence without having to deal with international calling issues. I've apparently been ignored on this issue, however; and when the ComCom (or someone else in the Foundation) decided they needed voice conferencing that was less unreliable than Skype (which isn't saying much) instead of turning to a volunteer who has actual experience with voice-over-internet applications in the real world, they pick up a voice chat program that is marketed primarily to gamers.

I hope the Foundation hasn't spent too much on Ventrilo, or on the Windows machine that it's running on. I consider every cent spent on it wasted money, when a deployable open source solution exists and has existed for months that does the same thing and since it runs on Linux, wouldn't have cost a Windows install, either.

Really underscores Wikimedia's need for a proper CTO, doesn't it?

Syndicated 2007-03-13 15:15:37 (Updated 2007-03-13 16:13:48) from Nonbovine Ruminations

Why do people edit Wikipedia?

I mentioned sourcing as the solution to the credentialing problem in my last post. And in thinking about some of what I wrote there, it occured me that Wikipedia's greatest problem has to do with "goal management" in the community. Wikipedia not only suffers from the lack of a meaningful goal statement itself (it defines itself as "a multilingual, web-based, free content encyclopedia project", but that doesn't really set measurable goals for the community), but also suffers because, in my expectations, the personal goals of many of its contributors do not align well with the overall project goal.

Really, why do people edit Wikipedia? And I don't mean "what draws people to Wikipedia". I want to know why people stay at Wikipedia. What addicts people to this site? And how does the personal pursuance of whatever personal wants these people are satisfying through their participation help make Wikipedia more of a "free content encyclopedia"?

I suspect that one of the major motivators is the desire to show off one's own knowledge. Quite a few people (including a number of the so-called "active contributors" who are currently claiming the right to VestedContributor status) are exercising this motive. And that's fine -- up to a point. There are at least two problems that come out of participants who are acting out of this motive. One of them is the tendency to claim ownership rights on one's own contributions. For these people, their contribution list, and especially their list of "featured articles" or otherwise specially-recognized content, becomes a substitute for penis size, and they thereby become unreasonably possessive of the articles they craft. They may also attempt to subvert the quality recognition processes in order to ensure that their own content is more likely to be favored with designations of quality, again in order to boost their own importance.

Another is the tendency to doggedly defend what they "know to be true" even when doing so is not defensible. Sometimes it's just due to lack of sourcing -- I could write a lot about computers, or electrical systems, or any of many other topics, based on my work experience, but I couldn't source most of it because I've learned what I know by osmosis, and don't have reference materials to back it up. Quite a lot of Wikipedia's content is of this origination. If Wikipedia had a dedicated corps of fact-checkers and fact-sourcers, this wouldn't be a problem; such articles would merely be treated as unverified drafts, presented as such, and eventually cleaned up to the next level. The problem is that that doesn't happen. The lack of such a corps is why the English Wikipedia has about a 9:1 crapticle:article ratio right now. As long as most of the content on Wikipedia is in the nature of "verbal diarrhea" (which so much of it is) this won't easily change.

Another major motivator is the urge to belong. Especially in the past year or so, Wikipedia has become one of the "places to be" on the Internet, and a large number of people have been attracted to Wikipedia for reasons that have rather little to do with Wikipedia being an encyclopedia. I have, in the past, referred to this as "playing the Wikipedia MMORPG". By and large, these people are teenaged boys, and virtually all of them are vandalism patrollers, some to the exclusion of all else. They have brought with them a number of undesirable dynamics: an expectation of hierarchy, a quest to achieve rank, a tendency toward dogmatic, unthinking application of rules, a propensity toward hazing those below them in their perception of the hierarchy, and a near-complete failure to understand the real purpose of Wikipedia in anything but the most superficial of ways. (That is to say, they usually claim to understand that Wikipedia is an encyclopedia, but have no idea what that means, and they have no grasp of board policy principles, only the specific statements made in individual policies.) I strongly suspect that PDD spectrum disorders are pervasive in this element. The sad thing is that these people are eminently useful to Wikipedia: they tend to be organized and follow clear process well, many are good at analyzing detailed data, and they generally do not mind doing repetitive tasks over and over again. We should put them to work sourcing articles; instead, we expect them to manage interpersonal conflict, which by and large they are really really bad at, and to deal with vandalism and trolls, which requires more diplomacy and discretion than the average member of this group can produce.

Another motivation, not entirely unrelated to the above, is to be important, powerful, influential, or even just noticed. I think this explains the Essjay situation to some degree. It can also explain a lot of trolling (since trolls typically do what they do in order to be noticed) as well as the addiction to drama that seems to pervade certain segments of the Wikipedia community.

Other motivations include the desire to overthrow the copyright system, or even to overthrow capitalism, a genuine interest in creating free content, and altruistic desires to create something of value for mankind. These people tend to be more likely to have an appreciation for the broader principles of the project. And they tend to run into trouble because they're the minority in the community.

The main failing, in my opinion, is Wikipedia's failure to erect a structure that can be used to channel the people in the second group into useful work. I am reminded of the death of the CommuniTree (see Shirky's article). The near-total lack of rules and structure on Wikipedia at the time these teenaged boys started flowing in meant that these children could not be channeled into useful work by an existing structure; had there been one most of them would have either flowed into it willingly or else rejected it and left. Because no such structure existed, they created one. And the one they created is designed not toward the purpose of creating a quality encyclopedia meeting certain fundamental core values, but instead toward the purpose of playing the Wikipedia MMORPG. Many of the current problems in the Wikipedia community is that there is a large, active subcommunity which views itself as having vested rights and which is pursuing goals that are at least somewhat at odds with the foundation goals of the project.

Fundamentally, Wikipedia needs to contemplate what the motives of its editors are, and find ways to discourage those whose motives are unproductive or which tend to work at cross purposes to the project's purposes. I don't think much effort has been put into understanding why people edit Wikipedia; in most cases we don't know at all, and where we have any clue it's based on voluntary responses to a survey, which often fails to reveal self-serving or otherwise "undesirable" motives that respondents don't want to admit to.

The conclusion I keep coming back to is that the wide-open unrestrained editing model that Wikipedia started out with simply will not lead to long-term reliable quality. Worse, the community structures that have evolved in recent months seem designed to entrench the current miasma of low-quality articles; I fear that without major change Wikipedia has reached the endpoint of its evolution.

It's a shame that Meatball is down at the moment; there's a lot of good stuff there that is pertinent to this discussion.

Syndicated 2007-03-11 15:58:00 (Updated 2007-03-11 16:27:15) from Nonbovine Ruminations

Credentialism

So, in the wake of the Essjay scandal, the whole issue of Wikipedians claiming credentials has become a topic of rather a lot of discussion.

One of Sanger's long disagreements with Wikipedia has been over Wikipedia's failure to recognize credentials. Citizendum has long had policies designed to give those having credentials special status as editors; recently, Citizendum withdrew the "honor principle" regarding credentials in Citizendum: Citizendium editors will now be required to validate their identities and credentials. Larry denies that the Essjay situation was involved, which I think is not entirely true, but anyway....

Still, the main topic of discussion has to be Jimbo's proposal regarding credentialing. Although which one to discuss: this one, where he suggests that degree-holders might fax a copy to the Foundation office? (Danny runs screaming to the hills, dreading the thousands of faxes of Damnation University diplomas that would invitably result.) Or this older one? Or this one (which appears to be the most recent Jimbo proposal), which (horrors of all horrors) authorizes the creation of a "This user has a Ph.D." userbox, linked to a subpage that lets people describe their attempts to verify the holder's claims of their credentials.

I submit that Jimbo has completely missed the boat here. Wikipedia should not care if you're credentialed. Credentialing creates a stratified, classful editing culture based on something that doesn't relate all that well to editing competency. It's not a structure that will help Wikipedians write an encyclopedia -- although it is one that will help some people win petty combats over article content. The only good that verified credentials do is provide some vague smattering of reason to believe that when someone says "This claim is bollocks" they're right -- and Wikipedia already has a better way to deal with that, that avoids all the issues associated with credentialing: sources.

Quite simply, if you're going to assert that something is true based on your credentials, you should stop and instead assert that it is true based on a source -- which, as a credentialed academic, you should be able to produce with relative ease. This works even for statements which are "generally accepted by experts in the field" because such statements are invariably made in introductory-level textbooks in that field, which most academics in that field would be familiar with, or at least be able to put their hands on without much difficulty. This may make editing "less fun" because you can't just go "Hey, I know that" and spew out three paragraghs of unsourced claims that you know are true but can't convince anyone else of, but hey, nobody ever said that writing an encyclopedia was easy. Letting people get off without proper sourcing got Wikipedia to 1,678,137 crapitcles; getting Wikipedia to half a million good articles is going to be a lot harder. (Really does remind me of writing open source software: it's easy to throw together a hunk of code that does something vaguely interesting, but both much harder and much less fun to turn that into a finished, polished product, which is why so many open source projects never make it much past version 0.2 or so.)

This solves the credentialing problem without actually requiring anybody to provide or verify credentials. With this asserted as practice, the only remaining issue is how to deal with someone who asserts something based on a claim of "expert credentials". And that's simple: demand them produce a source.

So, Jimbo, please stop talking about credentialing. It's the wrong solution to a problem that doesn't actually exist, simply because the community has already solved it. The solution to Essjay's use of falsified credentials is not to provide a mechanism for verifying such claims, but instead to remind the community that such claims don't meet with Wikipedia's pre-existing verification requirements and cannot be used to win debate. Enforce the policy Wikipedia already has; don't make bad new policy just to smooth over a PR flap.

Syndicated 2007-03-10 06:35:00 (Updated 2007-03-10 06:40:49) from Nonbovine Ruminations

Yet another social networking site

Ho, hum. Yet another social networking site has apparently started. I won't tell you the name here; if yuo want to find it, go to "Mark's Moolah" and read the article there. I took a brief glimpse and it looks like it's mainly being used to promote webcams, gaming groups, and political groups. Surprise.

I don't see how this one is any different from orkut, friendster, or any of the other social networking entities. I have yet to see a social networking application that served any redeeming purpose.

Syndicated 2007-03-08 00:18:43 (Updated 2007-03-08 00:18:52) from Nonbovine Ruminations

82 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!