Today, the GNUPedia Project came into being, its goal to develop a free encyclopedia. This could become a fantastic resource, but it seems in need of help on the technical side so that it starts off on solid ground.
Today, the GNUPedia Project came into being, its goal to develop a free encyclopedia. This could become a fantastic resource, but it seems in need of help on the technical side so that it starts off on solid ground.
From the articles, comments and diary entries here on Advogato it seems that many of us could offer valuable advice, particularly as we all participate in a peer-reviewed open community.
At the moment, they are asking for entries to be contributed, but they currently have no infrastructure in place to deal with them; the onus seems to be on `get contributions now, standardise them later'. This seems to me like a recipe for disaster, and I for one would hate to see such a promising project flounder for such a trivial reason.
Many of us `Advogatoans' seem to have experience in projects with large amounts data and web interfaces and all the kinds of things which are required: have a look at their webpage, subscribe to their mailing list and have a say before it's too late.
This sounds like a more open redux of what everything2 is doing, although they've gone the way of extreme cross-referencing and rather chaotic information storage, whereas it sounds like the GNU folks are trying to do the traditional encyclopedia idea.
Perhaps there are good ideas to be gleaned from e2's model?
The idea of a free encyclopaedia is a good one, but maintaining signal to noise is going to be a major problem with it.
This is lesson one that we can learn from e2 - There is some very good stuff, but also some stupid, opinionated, illiterate and downright
incorrect stuff.
I was a little surprised to see so many grammatical and spelling errors on the GNUPedia home page! (diff file duly emailed to maintainer) and am particularly pained to see that no format has so far been decided upon.
So... some questions to the crowd:
About two years ago, I did some research into doing a similar project. I'm a huge fan of encyclopedias (this sounds weird, but it's true. I've been known to sit around and just read a good encyclopedia for hours), and I take an active interest in information organization.
That project never came to fruit, but reading the information about Gnupedia, I see some major problems with the approach. First of all, the storage of information in pure HTML seems to be a bad choice for a heavily crossreferenced hypertext encyclopedia. Decentralized storage without cataloging or a format that explicitly knows about mirrors seems to be extremely non-robust and difficult to navigate. And, perhaps most importantly, the lack of editorial control seems to be a major mistake. The editorial and quality control is what makes something an encyclopedia. Without that, it's just a collection of random webpages that happen to be under a particular license.
While I see RMS' arguments that editorial control could be subject to political bias, etc., I'm sure there are better ways to limit that than to remove editorial and quality control completely.
Also, this will have a hard time competing with projects like Nupedia, which is also distributed under a free license (as far as I can tell), and is written with editorial control, and, perhaps more importantly, largely by academics and scolars in their respective fields (thus being more similar in organization to a regular encyclopedia).
Oh well. At least the free license will mean that someone can snarf all the content from Gnupedia, collect it on one site (or tarball, or CDROM), reformat it in a useful format (given that people at least use the hypertext capabilities of HTML properly, which I'm not sure they will), and make something that's somewhat useful.
As it is, the design decisions they've made seem to be well-suited to dooming the project to being a novelty, not really a useful tool.
It turns out that Nupedia is released under the GNU FDL, and that there has been lots of discussion of the relationship of the two projects on the mailing list. Checking out the archives is probably a good idea.
I strongly agree with RMS that we shouldn't let corporations seize control of the public knowledgebase. This is what we should be focusing our attention on - not so much what technologies we should use. That will rise up fairly naturally from this community. A useful endeavor would be to try and create a project to find all existing free e-texts (that have had their copyrights expire), and organize them by category. If these are presented in a useful manner, with excellent searching, and easy readability of text, it would be a huge boon to research. This in of itself would be a huge accomplishment. I think we would find it easier to take on the task of an encyclopedia if we had something like this under our belt already. Also, you would have public domain encyclopedias that could be mined for good content. We should always be working with the existing knowledgebase to advance it further. I think the free software movement has the potential to re-democratize the United States and other countries, but we need to take small, manageable steps. Just like what Diederot said as he wrote the first encyclopedia.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!