Older blog entries for robogato (starting at number 18)

Advogato's Aggravation

I've been pondering the problem of what do about Advogato's article section on the main page. Aside from the various bugs and feature requests I've been working on, the single most common complaint I've seen about the site is the low quality of the articles. As I mentioned in an earlier post, this problem has been brought up before.

It seems to me that rather than worry too much about how to prevent the occasional bad articles, we should focus on how to encourage useful and interesting articles. The first step is to find a definition of what useful and interesting mean in the context of Advogato.

Obviously, articles about software design, standards, or related topics are always interesting. If you're working on a paper or a talk for an upcoming FOSS conference, consider posting a freely licensed draft as an article to get feedback. The occasional interview, question, insight, or advice from someone in the community can also be interesting. Unfortunately, past experience shows we can't expect many of these types of articles. That still leaves a pretty big gap that will likely be filled by noise if it isn't used for something more interesting.

There are already plenty of sites like Slashdot where one can find vaguely FOSS-related links to news stories. I don't think Advogato should go the route of becoming yet another aggregator of recycled news stories. While that's an easy solution and would probably generate a lot of traffic, it's not why we're here. In one of Raph's early postings about Advogato he said the purpose of the site is "to bring a group of people closer together, not to generate hits.".

Robogato's Revelation

What is it that makes Advogato different from other Free Software/Open Source web communities? Most sites focus on a very particular FOSS sub-community: GNU, Apache, BSD, KDE, Mozilla, RedHat, Debian, FreeDesktop/X.Org, Perl, Python (to name just a few). Often, members of each community aggregate around each other, ignoring or forgetting what's going on in the larger FOSS community. Advogato, on the other hand, has active members from almost all these communities. This is one place where we can read each other's blogs and find out what's going on in other parts of the FOSS community.

When I realized what a unique position Advogato is in, it became obvious to me that one useful and interesting thing we can do is use the articles section to inform each other of what our respective communities have been doing on a weekly or monthly basis. Often the volume of news, blogs, and websites in each sub-community makes it difficult for an outsider to stay up to date.

As an illustration of this, I'm reminded of the LKML. The volume of the list makes it impossible for me to keep up - I simply don't have the time. However, I used to enjoy reading the Kernel Traffic summaries regularly so I'd have some idea of what the Linux developers were up to. Sadly, Kernel Traffic is no more. Likewise, there have been similar efforts to summarize activity in other communities (e.g. Brave GNU World, This Month in BSD, the gcc newsletter, WineHQ news, etc). Most of these are defunct, being replaced by dozens of individual websites, blogs, and mailing lists.

What I propose is recruiting Advogato users from each of the many FOSS communities to write and post a periodic summary of significant events in their respective groups. I'm willing to work with these volunteers to devise a useful format and a system for assembling the reports. This will take some time to get going so I think the best plan is to focus on the communities one by one, working out the system and getting things started, then moving on to the next group. As a start, I've written an example summary of the GNU project's activites this month. I've worked out where to get the information and how to assemble it into a simple format. I'll post it shortly as an article. What I need now is just one volunteer willing to contribute an hour of their time once a month to assemble and post a GNU update. Who's up for the job?

The next question is what FOSS community would you like to see a monthly summary of next? Ruby? Perl? BSD? I need suggestions and volunteers. gato@advogato.org

2 Feb 2007 (updated 3 Feb 2007 at 21:38 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details.

This rev adds FOAF files to our user profiles, helping to make Advogato part of the Semantic Web. Each account profile page has a visible FOAF link as well an auto-discovery meta link that points to a foaf.rdf file for that account. At present the FOAF files have minimal properties. The FOAF standard allows for some additional features that will probably be added over time. At present, outbound trust certifications are converted to foaf:knows properties. Inbound certs are ignored. Project relations are exported as foaf:currentProject properties. To get an idea of what you can do with FOAF, try using the DISCO Hyperdata Browser to view the FOAF file of an Advogato seed account such as Raph's (see also the FOAFer result for the same file).

In addition to the new FOAF badge, you may have noticed some other very minor changes on the user profile. I've done a little HTML clean up and correction. The old, ugly RSS image has been replaced with the standard feed icon established by the Mozilla Foundation. Combined with our new RSS 2.0 feeds, this almost makes it look like Advogato is a modern website. :-)

Among other minor changes, trust certifications now include a date stamp. This will allow the future addition of date-dependent trust features such as age-based certificate expiration for inactive users.

All of the admin functionality of mod_virgule has been moved to a single base URL where it can be password protected. This includes the diagnostics page and crank pages for diary ratings, trust metrics, and the aggregator. Several of these pages were security risks either by leaking information about the server configuration or by being CPU intensive enough to be useful for DoS attacks.

Certification dialog

cdfrey notes in his blog:

"I just noticed something new in the advogato pages. When looking at a user, you get the following warning:

Note: By certifying a user you are making a public statment that you know this person and can vouch for their identity.

When did this happen?

I must disagree with this sudden pseudo-gpg keysigning level of certification, especially since this warning is now retroactively applied to people's previous certifications, by mere virtue of being tacked on the bottom of the list."

The new text appeared on Oct 1, 2006 when Advogato was migrated to the newer version of mod_virgule. The message is hard-coded in the module that creates the user profile page and was originally added, not for Advogato, but for robots.net some years before.

On robots.net, the users are not all programmers and many don't have previous experience with any sort of trust metrics. As a whole, the user base had begun to view the trust metric system as nothing more than a group-powered method of allowing other users to post on the site. As a result there was a huge amount of cert inflation (even compared to Advogato) with a large percentage of the user base tending toward Master certification. Many users were automatically certifying all new users as Masters, assuming this would allow them to post and therefore improve the community. In reality, it just increased the noise and spam level, of course.

I experimented with a variety of short messages under the cert dialog to impress upon people that by certifying someone, they bore some responsibility for the results. This particular message seemed to have the most dramatic effect and, over time, solved our problem.

I agree it's unnecessary for Advogato since most users here understand to one degree or another what the trust metric is for. I'll take a look at making this page more easily configurable on a site-by-site basis. That will allow us to use different text on Advogato or remove the message altogether.

With regard to the actual meaning, I didn't intend for "know this person" to mean only that you've met them in person, in meatspace. You might also know them in some other online capacity outside of Advogato. You might know them through email, IRC, another website, etc. In some cases, you might even get to know them by reading their blog on Advogato long enough to feel comfortable expressing some trust for them. I assume Raph meant something similar in his original cert instructions when he says to certify "free software developers you know". My understanding of the trust metric is that you're certifying to the community that you trust the subject really is who they claim to be (at least to the extent that they claim to be a member of the free software community).

Advogato Status Report

I'm working on more code improvements but it will probably be next week before anything interesting emerges. In the meantime...

FAQ: I've added the beginnings of an Advogato FAQ to the site to help cut down on the time I spend answering emails. At present, there's no index and the questions are roughly in order of how frequently they're asked. (okay, one or two I just made up - I suppose they're Frequently Imagined Questions!). Have a look and don't hesitate to point errors or new questions that need to be added.

FOAF: Does anyone have any strong opinions on FOAF? Someone requested we add a FOAF file for each profile and this looks like it would be relatively easy to do. I'm not entirely sure I grok what the point of it is though. Does anyone, anywhere actually use FOAF RDF files for anything useful? Would it be a Good Thing if we suddenly add 10,000 people to the FOAF-o-sphere?

Article Quality: One thing that still seems to need fixing on Advogato is the quality of articles posted on our home page. At present every trusted Advogato user has the freedom to post articles. Unfortunately, not every trusted Advogato user has the ability to post relevant, quality articles. Is there a way to enforce quality without taking away everyone's freedom to post? For background see these two previous Advogato discussions on this subject:

Advogato Status Report

If you haven't been following the saga of Netscape and the RSS 0.91 DTD, here's the summary: On Jan 12 the folks at Deviceforge noticed that Netscape had removed the DTD from their website sometime after Jan 1, 2007. After Slashdot picked up on it, enough people complained that even someone at Netscape acknowledged the problem.

Yesterday, we got an official pronouncement from Netscape. They've agreed to restore the DTD but only until July 1, 2007 after which it will be removed again. Why? According to Netscape, your application shouldn't be "relying on the availability of a static document on a third-party Web server" like, say, a DTD. It's not clear what will happen to RSS 0.91 after July 1. Maybe Netscape will transfer their copyright on the DTD to the W3C and the URL will change. Maybe everyone will have to update their RSS software to ignore the DTD. Maybe everyone will stop using RSS 0.91. Who knows.

Why do we care? Because mod_virgule has always generated RSS 0.91 feeds for the articles on the main page and the user blog feeds. Most RSS readers don't bother to check the DTD but many do, and if the DTD is gone, no more Advogato feed. There was already a task on the ToDo list to bump all our feeds to RSS 2.0, so I did that today as it seemed like the easiest way to bypass the whole issue. All Advogato feeds are now RSS 2.0. I also added some of the optional tags that make life easy for aggregators like guid and pubDate.

15 Jan 2007 (updated 15 Jan 2007 at 22:11 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details. This upgrade required taking Advogato offline for about an hour to modify the XML database.

Until today, mod_virgule has stored timestamps in the XML data store that reflected the server's local time zone. The code then made assumptions about the time zone when rendering articles, posts, or RSS feeds. Prior to 3pm, 1 October, 2006, the server's local time zone was US Pacific time. When Advogato got transferred to our hosting facility, the new server was using the US Central time zone. This created a further complication because of the two hour time shift. Adding the blog aggregator made things worse because 99% of the incoming blog feeds use UTC timestamps.

Having to juggle three time zones on a regular basis was creating a bit of a headache for me. I decided it was time to get things under control before the code got so complicated that only a Time Lord from Gallifrey could understand it. So mod_virgule now uses UTC for everything. The code changes were relatively straightforward but normalizing Advogato's rather large XML data store was another matter. I wrote a Perl program that recursively scanned Advogato's 30,000+ XML files looking for timestamps in several different formats and adjusted them to UTC (which required a different offset depending on whether they were recorded before or after 3pm, 1 Oct, 2006). That's the reason for the brief downtime.

So, anyway, we're back up and everything should be working the same as always aside from being on UTC time rather than Central time. If anyone notices any breakage, let me know.

5 Jan 2007 (updated 7 Jan 2007 at 02:47 UTC) »

Advogato Status Report

The first new rev of mod_virgule code for 2007 went live today. See the changelog for the details. Basically, it's all bug fixes.

The important one is a rewrite of the diary entry storage code. For users whose posts arrive via syndication, the new code will allow local editing and xml-rpc editing without the save wiping out all the extra XML tags that store syndication state info. This bug was causing the occasional duplicate of syndicated posts (and it's why I warned against mixing local blogging with syndicated blogging when we turned on the aggregator).

Update: Hmmmm... okay, there's still at least one other problem with mixed local and syndicated blogging that can lead to duplicated entries. I'll see if I can track it down soon...

Update 2: Fixed what should be the last issue causing problems for mixed posting. It may actually be safe now. Unfortunately, I discovered one more cause of duplicated posts. There's an RSS variant that retroactively alters the post time of an entry each time it's edited, which confuses our simple little aggregator into thinking it's a new post. Working on a fix now. The world would be such a nicer place if everyone used a sane syndication method like Atom...

Update 3: RSS feeds with shifting date stamps should now be handled a little better. At least if the feed in question has unique item identifiers (some do, some don't - you never know what you'll get with RSS).

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details - but only if you're really bored. There were only very, very minor changes. With the holidays coming up, I'm not sure how much time I'll have to work on the code over the next couple of weeks. So don't expect any spectacular new features.

What would be nice is seeing one shiny new article posted on Advogato before the end of December. If any Advogato users presented at the recent OSDC and have an interesting paper, maybe you could post it here as an article. Just a thought.

Advogato and Greenhouse Gas

I noticed pphaneuf's post about Second life, computer power consumption and the relation to CO2 emissions. I may not have mentioned before that the server Advogato is hosted on now, and our entire little facility, is powered by 100% wind generated power. We recently got our EPA Green Power Partner approval. I've never calculated the electricity used by just Advogato but overall we use about 4,000 kWh per month. According to most estimates I've seen, this translates into 6,000 - 8,000 pounds of CO2 that we avoid putting into the air each month. And we aren't the first. I've seen several other hosting facilities that have gone to 100% non-polluting power providers. Here in Texas, it's actually saving us money too, since the cost of wind tends not to be affected much by the rising cost of gas and coal. So maybe some of the Second Life users should ask about that.

7 Dec 2006 (updated 7 Dec 2006 at 01:37 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details.

I've added support for a couple of additional RSS variants with ever more unusual date stamp formats. In theory the RSS pubDate tag is suppose to use the date format described in RFC822. The first problem is that RFC822 allows a lot of variation. The second problem is that RFC822 specifies a two digit year. For obvious reasons most RSS feeds use a four digit year. Mod_virgule's first line of defense is to call the Apache APR routine apr_date_parse_rfc(), which will parse all date strings that actually comply with RFC822, plus nine variants that are not strictly RFC822 compliant but are commonly seen in the wild. So far, at least one common blogging app, Blosxom, produces a pubDate field that is not RFC822 compliant and can't be parsed by apr_date_parse_rfc(). I've added a custom strptime() call that handles these. A patch for the Apache APR folks is in the works.

Some RSS feeds don't have a pubDate tag at all. Instead they have a date tag which, instead of RFC822, contains an RFC3339 formatted date string. This is actually much nicer, since it's a slightly more sane format and is the same one used in Atom feeds, so we already have code for handling it.

Speaking of Atom, the mod_virgule aggregator now supports the old, deprecated Atom v0.3 feeds in addition to the current Atom v1.0 standard.

So here's what we support right now:

  • Atom 0.3
  • Atom 1.0
  • RSS 0.91 *(only if optional pubDate or date tags are included)
  • RSS 0.92 *(only if optional pubDate or date tags are included)
  • RSS 2.0
  • RDF Site Summary 0.9 *(untested)
  • RDF Site Summary 1 *(all variants seen so far work)
  • RDF Site Summary 1.1 *(untested)

I wish I could support the RSS 0.91/0.92 feeds that don't have any sort of time or date stamps at all but it would require some reworking of the code in the aggregator that sorts out which posts are new and which have been seen before. In most cases RSS 0.91/0.92 allows the use of both date and pubDate, so if you make sure those tags are included, things should work fine. Otherwise, your best bet is to use something a little more recent like RSS 2.0 or Atom 1.0.

The other update this week was a performance improvement. Each hour the trust metric and blog interest eigen vector ratings are recalculated. The eigen vector recalculation takes several minutes to complete. In the past the process held a read lock on the XML database, preventing any other process from taking a write lock. This caused some operations on Advogato to block (such as clicking on the "Read more..." link of articles, which writes an update to the user's "last read" pointers). This problem is now fixed. The site should seem signficantly less sluggish at the top of the hour when the update runs. The eigen vector processing now releases the read lock and gives up its time slice, then re-acquires the lock on each iteration. The total processing time is slightly longer (from 3 minutes to 3.25 minutes) but during that time the site can be used normally without feeling slow.

9 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!