2 Jun 2011 robogato   » (Master)

Robogato Returns

We had a bad hardware crash recently and, as I was restoring Advogato to new hardware, I realized that it's been too long since I've devoted any significant time to improving the code around here. I took advantage of the downtime caused by the crash to make some final tweaks to the long-awaited libxml2 based HTML parser and made it live. It fixes a lot of the rendering problems already and will fix more once I make a few more tweaks.

I'm also working on improving security in general and making account creation by spammers harder in particular. I had a nice email exchange with dkg about the subject awhile back. He took a look at the code and provided a laundry list of things that needed fixing or improving. I'm working on those now. The first change just went live this week - mod_virgule now requires the POST method for submitted forms. This minor change already stopped a couple of our automated account spammers who were creating accounts with GETs. Only the dumbest spammers were doing that I'd think. Using POST isn't much harder. More changes to come.

If you're wondering what caused the increase in spam accounts we've been seeing for the last year, here's a possible contributor: Incansoft, apparently a purveyor of web-based spam tools, added an Advogato attack to a spamming tool they sell called Web20Bot (sorry, not going to link to it but you can google it). Web20Bot will create phony account profiles containing your backlink spam on 20 websites including Advogato.org, squidoo.com, wordpress.com, blogger.com, tumblr.com, and livejournal.com. They claim Web20Bot handles email verification and captchas, so working out a defense may be interesting. I doubt any of their spam lasts more than 48 hours around here anyway but it would be nice to make life harder for them. (incidentally, if someone were to come up with a copy of this thing so we could analyze it, that might be cool - maybe we could help other sites being attacked by it too).

Update: Thanks for pointing out those issues, Redi. I've fixed the diary edit problem, it should not have been checking for a POST. The <person>, <project>, and <wiki> tags were special cases in the old HTML handler. If one is broken, all three probably are. I'll get on that now. It will take me a little while to track down the problem. <proj> was deprecated in favor of <project> way back in the Raph days but the code checking for <proj> wasn't dropped until this most recent update. I didn't realize anyone still used it. I can add it back in.

Update 2: Ok, found the problem. The old tag handlers output directly to the apache buffer while the new handlers modify the XML tree, which is rendered to the buffer later. I need to modify or replace the handlers for those three tags. I'll try to get to it today if time allows.

Update 3: I think the special tag issue is fixed now, let's try this code for a day or so and see if any problems show up.

<person> test: redi

<proj> test: mod_virgule

<project> test: mod_virgule

<wiki> test: WikiPedia:Advogato.org

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!