<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for robogato</title>
    <link>http://www.advogato.org/person/robogato/</link>
    <description>Advogato blog for robogato</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Thu, 23 May 2013 19:47:13 GMT</pubDate>
    <item>
      <pubDate>Sun, 5 Feb 2012 08:30:24 GMT</pubDate>
      <title>5 Feb 2012</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=36</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=36</guid>
      <description>As you probably noticed we're under attack by spammers again. Heavy account creation and blog spamming wiped out the recentlog. It's partially recovered and should be back to normal in another few hours. Account creation is off for now so that should prevent further spamming but the site may be slow due to the heavy traffic generated by the spammers. Looks like a botnet or multiple proxies being used. If anyone's interested in doing a little research on their own, here are a few of the many IPs from which the spam is originating: 173.208.47.67, 218.186.17.251, 190.212.92.132, 99.129.227.221, 86.122.20.133,  61.140.173.221, 67.72.247.233, 176.9.33.251, 110.4.89.20, 122.177.153.205, 66.56.158.67, 72.64.98.16, 79.141.172.14.</description>
    </item>
    <item>
      <pubDate>Thu, 26 Jan 2012 23:25:13 GMT</pubDate>
      <title>26 Jan 2012</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=35</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=35</guid>
      <description>&lt;p&gt;&lt;b&gt;More Minor Security Updates&lt;/b&gt;&lt;/p&gt;&lt;p&gt;I declared an Advogato hacking day today and got a little more work done on our security ToDo list. I've added a set of cryptographic nonce functions to generate tokens for email verification and CSRF prevention. The tokens have configurable expiration times. The new code replaces the hard-coded token generation used by the original cookie functions.&lt;/p&gt;&lt;p&gt;I also added a generic email function that can be used for account verification. This replaced the hard-coded part of the password recovery email function.&lt;/p&gt;&lt;p&gt;I was able to get the CSRF token code integrated with the account creation forms. It's tested and live. Hopefully this will knock out a few more of our automated account spammers including the commercial Incansoft spamming tools. I've still got a little more work to do before I can turn on the email verification but we're nearly there.&lt;/p&gt;</description>
    </item>
    <item>
      <pubDate>Mon, 12 Sep 2011 16:00:12 GMT</pubDate>
      <title>12 Sep 2011</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=34</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=34</guid>
      <description>&lt;p&gt;&lt;b&gt;Status Update&lt;/b&gt;&lt;/p&gt;&lt;p&gt;Advogato has been under a sustained attack from spammers since 11:00 UTC Sunday. The attack is originating from a botnet of at least several hundred nodes with world wide distribution. The attack is automated and creates 10 to 20 new user accounts with large, spam-filled blog posts every minute. I discovered the attack around two hours after it started and immediately turned off new account creation.&lt;/p&gt;&lt;p&gt;Mod_virgule buffers the 100 most recent new accounts for display in the "recent people joining" box on the front page. The attackers had blown past that number pretty quickly, requiring me to use the web server logs to track down and remove the bad accounts. Once removed, it left the recent accounts buffer completely empty. It will fill up again once I'm able to turn new account creation back on.&lt;/p&gt;&lt;p&gt;I spent a while Sunday logging and blocking IPs for individual nodes of the attacking botnet but basically gave up after blocking the first hundred or so. With account creation off, the attackers fail to create accounts and what we're left with is a low-level DDoS attack. The bandwidth being used isn't disabling and hopefully the attacker will give up once they realize no new accounts are being created.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Other Fun&lt;/b&gt;&lt;/p&gt;&lt;p&gt;The switch to the libxml2 HTML parser solved a lot of internal problems but as some of you have noticed, it introduced a new one. Libxml2 "thinks" in XML and when it comes across a set of HTML tags with no content, such as &amp;lt;em&amp;gt;&amp;lt;/em&amp;gt; it turns that into a self-closing tag: &amp;lt;em /&amp;gt; which is great if you're viewing the result with an XML parser but most browser HTML parsers can't parse certain tags as self-closing and see the tag as an open with no corresponding close. This has the effect of including all the subsequent markup on the page inside the offending tag, usually terminating display of the page.&lt;/p&gt;&lt;p&gt;It looks like only a handful of tags produce this effect, so it should be possible to filter them out. It may be possible to drop empty tag pairs before parsing or convert them back to open/close pairs.&lt;/p&gt;&lt;p&gt;&lt;b&gt;&lt;a href="http://www.advogato.org/person/redi/diary/249.html" &gt;Redi&lt;/a&gt;&lt;/b&gt;: in theory yes but the mod_virgule codebase is scary mix of HTML 4 (and earlier), XHTML, and XML. Throw in the random markup coming in from syndicated blogs and the resulting tag soup is very difficult to normalize without breaking something. However, incoming blog markup was previously being normalized to XHTML by libxml2 and I'm thinking now, we may have to switch that to HTML 4 to force the open/close tags. The function you mention produces different output depending on what markup type is specified on the tree (or on the individual node). So, parse the blog, walk the tree forcing it all to HTML 4, then ask libxml2 to export it. Maybe... I'm doing some work on the code today, so I'll let you know.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Another Update&lt;/b&gt;: I've got some code changes in that might (or might not) help with the broken tag problem. We'll have to see if any incoming blog posts break anything over the next day or so. Nothing new on the spam attack, it's still going strong. I'm going to look at implementing a few more security features in the code that might allow us to turn account creation back on without waiting for the attack to subside.&lt;/p&gt;</description>
    </item>
    <item>
      <pubDate>Thu, 2 Jun 2011 23:15:35 GMT</pubDate>
      <title>2 Jun 2011</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=33</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=33</guid>
      <description>&lt;p&gt;&lt;b&gt;Robogato Returns&lt;/b&gt;&lt;/p&gt;&lt;p&gt;We had a bad hardware crash recently and, as I was restoring Advogato to new hardware, I realized that it's been too long since I've devoted any significant time to improving the code around here. I took advantage of the downtime caused by the crash to make some final tweaks to the long-awaited libxml2 based HTML parser and made it live. It fixes a lot of the rendering problems already and will fix more once I make a few more tweaks.&lt;/p&gt;&lt;p&gt;I'm also working on improving security in general and making account creation by spammers harder in particular. I had a nice email exchange with &lt;a href="http://www.advogato.org/person/dkg/" &gt;dkg&lt;/a&gt; about the subject awhile back. He took a look at the code and provided a laundry list of things that needed fixing or improving. I'm working on those now. The first change just went live this week - mod_virgule now requires the POST method for submitted forms. This minor change already stopped a couple of our automated account spammers who were creating accounts with GETs. Only the dumbest spammers were doing that I'd think. Using POST isn't much harder. More changes to come.&lt;/p&gt;&lt;p&gt;If you're wondering what caused the increase in spam accounts we've been seeing for the last year, here's a possible contributor: Incansoft, apparently a purveyor of web-based spam tools, added an Advogato attack to a spamming tool they sell called Web20Bot (sorry, not going to link to it but you can google it). Web20Bot will create phony account profiles containing your backlink spam on 20 websites including Advogato.org, squidoo.com, wordpress.com, blogger.com, tumblr.com, and livejournal.com. They claim Web20Bot handles email verification and captchas, so working out a defense may be interesting. I doubt any of their spam lasts more than 48 hours around here anyway but it would be nice to make life harder for them. (incidentally, if someone were to come up with a copy of this thing so we could analyze it, that might be cool - maybe we could help other sites being attacked by it too).&lt;/p&gt;&lt;p&gt;&lt;b&gt;Update:&lt;/b&gt; Thanks for &lt;a href="http://advogato.org/person/redi/diary/243.html" &gt;pointing out those issues, Redi&lt;/a&gt;. I've fixed the diary edit problem, it should not have been checking for a POST. The &amp;lt;person&amp;gt;, &amp;lt;project&amp;gt;, and &amp;lt;wiki&amp;gt; tags were special cases in the old HTML handler. If one is broken, all three probably are. I'll get on that now. It will take me a little while to track down the problem. &amp;lt;proj&amp;gt; was deprecated in favor of &amp;lt;project&amp;gt; way back in the Raph days but the code checking for &amp;lt;proj&amp;gt; wasn't dropped until this most recent update. I didn't realize anyone still used it. I can add it back in.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Update 2:&lt;/b&gt; Ok, found the problem. The old tag handlers output directly to the apache buffer while the new handlers modify the XML tree, which is rendered to the buffer later. I need to modify or replace the handlers for those three tags. I'll try to get to it today if time allows.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Update 3:&lt;/b&gt; I think the special tag issue is fixed now, let's try this code for a day or so and see if any problems show up.&lt;/p&gt;&amp;lt;person&amp;gt; test:  &lt;person&gt;redi&lt;/person&gt;&lt;br/&gt;&lt;br/&gt;
&amp;lt;proj&amp;gt; test: &lt;proj&gt;mod_virgule&lt;/proj&gt;&lt;br/&gt;&lt;br/&gt;
&amp;lt;project&amp;gt; test: &lt;project&gt;mod_virgule&lt;/project&gt;&lt;br/&gt;&lt;br/&gt;
&amp;lt;wiki&amp;gt; test: &lt;wiki&gt;WikiPedia:Advogato.org&lt;/wiki&gt;&lt;br/&gt;&lt;br/&gt;
</description>
    </item>
    <item>
      <pubDate>Wed, 21 Jan 2009 19:33:28 GMT</pubDate>
      <title>21 Jan 2009</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=32</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=32</guid>
      <description>&lt;p&gt;&lt;b&gt;Watch for Spammers&lt;/b&gt;&#xD;
&lt;p&gt;If you're wondering about the source of the recent&#xD;
increase in phony users signing up for Advogato accounts, I&#xD;
think I've found it. A number of Russian&#xD;
SEO/spammer blogs are discussing a list of websites that&#xD;
seem to be highly trusted by Google based on the ratio of&#xD;
pages in the main Google index to the &lt;a href="http://www.mattcutts.com/blog/indexing-timeline/" &gt;supplemental&#xD;
Google&#xD;
index&lt;/a&gt;. Advogato is #16 on the list. (I'd provide some&#xD;
links but giving them links&#xD;
from Advogato is the last thing we should do. If you're&#xD;
curious you should be able to find them using a site like&#xD;
&lt;a href="http://technorati.com/" &gt;Technorati&lt;/a&gt; to find&#xD;
blogs that have linked to Advogato in the&#xD;
last few weeks.) &#xD;
&lt;p&gt;A side effect has been a big bandwidth hit. I thought at&#xD;
first we'd been slashdotted. But the main result is a rash&#xD;
of SEO spammers signing up for Advogato accounts and trying&#xD;
to find some way to get backlinks to their link farms and spam&#xD;
sites. Average survival time for their profiles has been&#xD;
less than 48 hours so probably nothing to worry about but&#xD;
everyone should take a look at the "recent people joining"&#xD;
list and flag anyone who looks like spam. Hopefully it will&#xD;
die down in a week or two.</description>
    </item>
    <item>
      <pubDate>Sun, 24 Feb 2008 00:15:50 GMT</pubDate>
      <title>24 Feb 2008</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=31</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=31</guid>
      <description>Test post for the libxml2 HTML parser&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; In theory, the libxml2 HTML parser should make best&#xD;
guesses on how to fix screwed up, illegal HTML and all tags&#xD;
should get closed at the end of this diary entry, preventing&#xD;
problems in diary entries that follow or elsewhere on the page.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;b&gt;bold tag with no close&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;i&gt;italics tag with no close&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;strike&gt;strike tag with no close&lt;/strike&gt;&lt;/i&gt;&lt;/b&gt;&#xD;
&#xD;
&lt;p&gt; Update Jan 2009: after a long downtime, I'm finally working&#xD;
on the HTML parser again. Should have it live this month!</description>
    </item>
    <item>
      <pubDate>Thu, 10 Jan 2008 17:36:03 GMT</pubDate>
      <title>10 Jan 2008</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=30</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=30</guid>
      <description>&lt;p&gt;&lt;b&gt;Advogato Status Report&lt;/b&gt;&#xD;
&lt;p&gt;&#xD;
My New Year's resolution is to start doing monthly&#xD;
status reports again! Here's the first one. &#xD;
&#xD;
&lt;p&gt; Even though I haven't posted a status update in a while,&#xD;
minor code updates have continued. To find out what's&#xD;
changed in the live &lt;a href="http://www.advogato.org/proj/mod_virgule/" &gt;mod_virgule&#xD;
code&lt;/a&gt; running Advogato, see the &lt;a href="http://svn.dprg.org/viewvc/mod_virgule/trunk/ChangeLog?view=markup" &gt;changelog&lt;/a&gt;.&#xD;
It's always there and nearly always up to date. &#xD;
&#xD;
&lt;p&gt; The biggest change has been in the XML file store locking&#xD;
code. The previous system relied on a site-wide read/write&#xD;
lock that locked out access to the entire database when&#xD;
writes were happening. This was getting to be a problem&#xD;
because of trust recalculations and diary syndication that&#xD;
happens at the top of the hour. Write locks were often&#xD;
clogging things up for 10 to 15 minutes per hour. &#xD;
&#xD;
&lt;p&gt; But it's all good now. All the locking code has been totally&#xD;
ripped out and replaced with file-level locking. There&#xD;
should almost never be any detectable site delays caused by&#xD;
locking now. Besides fixing the hourly slowdowns, this&#xD;
also gives us a little more breathing room to continue growing.&#xD;
&#xD;
&lt;p&gt; Another recent change is a patch from &lt;a href="http://www.advogato.org/person/fzort/" &gt;fzort&lt;/a&gt; that&#xD;
improves the HTML parsing code to eliminate undesirable tag&#xD;
attributes. The long-term the plan is still switching to&#xD;
libxml2's HTML parser and junking the one in mod_virgule&#xD;
but, until then, this should make things a little more secure.&#xD;
&#xD;
&lt;p&gt; A few other fixes and improvements:&#xD;
&#xD;
&lt;p&gt; The GUID of syndicated blog posts is now preserved when they&#xD;
go out on the&#xD;
Advogato diary RSS feed.&#xD;
&#xD;
&lt;p&gt; Mod_virgule now has built in support for Google Analytics.&#xD;
Drop your GA ID code into the config.xml and the appropriate&#xD;
GA markup appears on every page throughout the site.&#xD;
&#xD;
&lt;p&gt; &lt;a href="http://www.advogato.org/person/presbrey/" &gt;Joe&#xD;
Presbrey&lt;/a&gt; of MIT contributed a patch for an external FOAF&#xD;
URI on the user profile. This allows you to link your&#xD;
Advogato FOAF to any other existing FOAF profile you may&#xD;
have, helping to consolidate your online identify. &#xD;
&#xD;
&lt;p&gt; The computed trust level for each user is now exported via&#xD;
FOAF, referencing a local RDF schema that describes the&#xD;
trust levels. This mechanism was suggested by Sean B. Palmer&#xD;
and &lt;a href="http://www.advogato.org/person/connolly/" &gt;Dan&#xD;
Connolly&lt;/a&gt; on the W3C &lt;a href="irc://irc.freenode.net/#swig" &gt;#swig IRC channel&lt;/a&gt;.&#xD;
</description>
    </item>
    <item>
      <pubDate>Fri, 31 Aug 2007 23:29:23 GMT</pubDate>
      <title>31 Aug 2007</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=29</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=29</guid>
      <description>&lt;p&gt;&lt;b&gt;Advogato Status Report&lt;/b&gt;&#xD;
&lt;p&gt;A new rev of &lt;a&#xD;
href="http://www.advogato.org/proj/mod_virgule/"&gt;mod_virgule&#xD;
code&lt;/a&gt; is live on Advogato. See the &lt;a&#xD;
href="http://svn.dprg.org/viewvc/mod_virgule/trunk/ChangeLog?view=markup"&gt;changelog&lt;/a&gt;&#xD;
for the details. Here are a few highlights.&#xD;
&lt;p&gt;&#xD;
A discussion between &lt;a&#xD;
href="http://www.advogato.org/person/raph/"&gt;ncm&lt;/a&gt;, &lt;a&#xD;
href="http://www.advogato.org/person/raph/"&gt;raph&lt;/a&gt;, and &lt;a&#xD;
href="http://www.advogato.org/person/raph/"&gt;chrisd&lt;/a&gt;&#xD;
speculated on why there seemed to be a decline in Google&#xD;
rankings for individual blog content on Advogato lately. It&#xD;
was suggested that a change in the Google ranking algorithm&#xD;
may be placing less value on pages with dynamic URLs like&#xD;
&lt;a&#xD;
href="http://www.advogato.org/person/ncm/diary.html?start=191"&gt;http://www.advogato.org/person/ncm/diary.html?start=191&lt;/a&gt;.&#xD;
Advogato has long had static URLs for individual articles,&#xD;
so I've added similar support for each individual blog post.&#xD;
If you click the permalink marker beside one of your blog&#xD;
posts, you'll see it now goes to a static URL with just that&#xD;
one post on the page instead of to a dynamic URL that&#xD;
includes a range of posts. For example: &lt;a&#xD;
href="http://www.advogato.org/person/ncm/diary/190.html"&gt;http://www.advogato.org/person/ncm/diary/190.html&lt;/a&gt;.&#xD;
 The old, dynamic system is still in place so search engines&#xD;
and existing links will get to the right place, of&#xD;
course. There's another advantage to having the static URLs&#xD;
to individual blog entries. These will be used for comment&#xD;
pages eventually. Yes, blog comments are really coming. I&#xD;
promise. Some day.&#xD;
&lt;p&gt;&#xD;
There's also a fix to minor foaf:mbox_sha1sum bug that was&#xD;
noticed by &lt;a href="http://harth.org/andreas/" &gt;Andreas&#xD;
Harth&lt;/a&gt;.&#xD;
&lt;p&gt;&#xD;
You may have noticed that our Italian cittaditorino spammers&#xD;
were back with a vengence the last couple of weeks. The&#xD;
community spam flagging system seems to be controlling them.&#xD;
Most of the bogus accounts are being deleted within a few&#xD;
days of creation. At ncm's suggestion, I've added&#xD;
rel="nofollows" attributes to all links to untrusted users&#xD;
in the recentlog,&#xD;
recent people joining list, and Advogato People index. &#xD;
There were already nofollows on all links created by&#xD;
untrusted users but this new addition should prevent search&#xD;
engines from even indexing their profile and blog pages.&#xD;
With all these spam control measures in place, keep in mind&#xD;
it's a little harder than it used to be for real users&#xD;
to create an Advogato account and get certified. Well-known&#xD;
users aren't having much trouble and the new trust injected&#xD;
by adding &lt;a&#xD;
href="http://www.advogato.org/person/mako/"&gt;mako&lt;/a&gt; as a&#xD;
seed has helped tremendously.  But there&#xD;
are users here and there who haven't collected enough&#xD;
certs to become trusted, like &lt;a&#xD;
href="http://www.advogato.org/person/pabs3/"&gt;pabs3&lt;/a&gt;.&#xD;
&lt;p&gt;&#xD;
&lt;p&gt;That's all the news for now but more new features are on&#xD;
the way.</description>
    </item>
    <item>
      <pubDate>Wed, 1 Aug 2007 15:46:30 GMT</pubDate>
      <title>1 Aug 2007</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=28</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=28</guid>
      <description>The URL rendering bug that &lt;a&#xD;
href="http://www.advogato.org/person/redi/diary.html?start=102"&gt;redi&#xD;
spotted&lt;/a&gt; has been fixed, I think. Looks like it was an&#xD;
artifact of the Apache APR 1.3 to 2.0 upgrade that had gone&#xD;
unnoticed for a quite a while. If anyone spots any other URL&#xD;
issues in the project section, let me know.</description>
    </item>
    <item>
      <pubDate>Mon, 30 Jul 2007 18:29:13 GMT</pubDate>
      <title>30 Jul 2007</title>
      <link>http://www.advogato.org/person/robogato/diary.html?start=27</link>
      <guid>http://www.advogato.org/person/robogato/diary.html?start=27</guid>
      <description>&lt;p&gt;&lt;b&gt;Advogato Status Report&lt;/b&gt;&#xD;
&lt;p&gt;A new rev of &lt;a&#xD;
href="http://www.advogato.org/proj/mod_virgule/"&gt;mod_virgule&#xD;
code&lt;/a&gt; is live on Advogato. See the &lt;a&#xD;
href="http://svn.dprg.org/viewvc/mod_virgule/trunk/ChangeLog?view=markup"&gt;changelog&lt;/a&gt;&#xD;
for the details.&#xD;
&lt;p&gt;Aside from the usual minor bugfixes and tweaks, there are&#xD;
two new features you may have noticed already.&#xD;
&lt;p&gt;&lt;b&gt;New certification indicators:&lt;/b&gt; A visual&#xD;
indication is now added to trust certifications that are&#xD;
less than&#xD;
30 days old. This should make it easier to spot new certs on&#xD;
the user profiles. You can check this out on your own user&#xD;
profile if you've certified anyone, or been certified by&#xD;
anyone, in the last 30 days.&#xD;
&lt;p&gt;&lt;b&gt;Article lists:&lt;/b&gt; Ever wonder how many Advogato&#xD;
articles you've posted? Or wanted to read other articles by&#xD;
a particular poster? Each user profile now includes a&#xD;
reverse chronological list of the 10 most recent articles&#xD;
posted by that user. For &lt;a&#xD;
href="http://www.advogato.org/person/lkcl/"&gt;users who are&#xD;
more prolific&lt;/a&gt;, there is a link to a separate page that&#xD;
includes a &lt;a&#xD;
href="http://www.advogato.org/person/lkcl/articles.html"&gt;complete&#xD;
listing&lt;/a&gt;&#xD;
of all articles posted by that user.&#xD;
&lt;p&gt;In addition to providing a new way to explore Advogato's&#xD;
articles, this should provide another direct route for&#xD;
search engine robots to find the static links to the&#xD;
articles.</description>
    </item>
  </channel>
</rss>
