23 Sep 2005 tnt   » (Master)

RSS and Atom Feed Rediscovery #
2005-09-23T01:10:30-07:00
Topics: Syndication  RSS  Atom  

How does a machine -- a piece of software, a web crawler, a feed aggregator -- find a blog's RSS or Atom feed‽ Human's might be able to figure out that the feed's URL is behind one of those orange "RSS", "Atom", or "XML" buttons; or behind one of the many other colorful button types out there. But how does a machine figure this out‽ The answer is RSS and Atom autodiscovery.

Matt Griffith had a great idea for RSS autodiscovery that was refined in Mark Pilgrim's article on RSS autodiscovery. A way for a machine to easily figure out the RSS (or Atom) feed of your blog. But what happens if your RSS or Atom feed moves‽ Maybe you moved it to somewhere else on your site; maybe you moved it to a different server (with a different domain); or maybe you want to have it hosted using a third party feed management system (because they provide you with nice reports or something), or maybe you are even changing which third party feed management system you are using. What do machines do that already autodiscovered your feed‽ How do they find the new URL of your feed‽ That's where RSS and Atom feed rediscovery comes into play; a complement to the RSS and Atom autodiscovery technology. RSS and Atom feed rediscovery helps fix some problems we are facing with feeds. RSS and Atom feed rediscovery makes it so that bloggers can change the URL of their feed without having to worry about loosing subscribers.

The problem is that when software subscribes to these feeds, alot of them only save the URL to the RSS or Atom feed. The problem is that alot of the software out there is neglecting to save the URL of the home page of the blog. This can be a major problem if a blog changes the URL of its RSS or Atom feed.

Consider it, what happends if I change the URL of my RSS or Atom feed‽ How do machines and people's software which are subscribed to my old RSS or Atom feed URL find the new one‽ How do they even know it's been moved‽ That's where RSS and Atom feed rediscovery comes in. And it's just a rule for machines to follow. The rule for this is:

When machines subscribe to an RSS or Atom feed, in addition to caching the URL of the RSS or Atom feed, they also save the URL of the blog's homepage. That way, if the URL they have cached for the RSS or Atom feed stops working, they can go to the URL of the blog's homepage, and perform RSS or Atom autodiscovery (again) to find the new URL of the blog's feed.

Also, periodically, even if the feed URL is working, they should go to the blog's homepage and rediscover feed's URL. That way bloggers can gracefully migrate users to a new feed URL. And not loose any subscribers.

The question that may be coming to your mind now may be: how does a machine figure out the URL of the blog's homepage‽ Well, there's alot of different ways you could do it. But I'll provide you with just one easy way.

When you initially subscribe to a feed, you can find the URL to the blog's homepage via the RSS <link> element (under the <channel> element). For example:

    <?xml version="1.0">
    
    <rss version="2.0">
    
        <channel>
            <title>Joe Blow's Blog</title>
            <description>The weblog for Joe Blow</description>
            <link>http://joe-blow.example.com/</link>
    
            <item>
                <title>First Post!</title>
                <link>http://joe-blow.example.com/log/9778afc5-43b6-4ab5-a5d5-558290502bc3</link>
            </item>
    
        </channel>
    
    </rss>
            

Or if it's an Atom feed, then:

    <?xml version="1.0">
    
    <feed xmlns="http://www.w3.org/2005/Atom"
          xml:base="http://joe-blow.example.com/"
    >
        <id>c63590c5-c0a4-48ce-ab99-1f2a5cf2a794</id>
        <title>Joe Blow's Blog</title>
        <link href="/" />
    
        <link rel="self" href="/feed" />
        <updated>2003-12-13T18:30:02Z</updated>
    
        <entry>
            <id>9778afc5-43b6-4ab5-a5d5-558290502bc3</id>
            <title>First Post!</title>
            <link href="/log/9778afc5-43b6-4ab5-a5d5-558290502bc3" />
            <updated>2003-12-13T18:30:02-05:00</updated>
        </entry>
    
    </feed>

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!