And why not RSS/PUT?

Posted 2 Mar 2005 at 16:17 UTC by garym Share This

It's one of those, one of these, half a dozen of the other scenarios, and it indirectly involves RSS because I think this may be a solution to my much broadcast prognosis on the fate of RSS, and put quite simply and direct, it goes like this:

why not distribute RSS via a 'Listener' pattern?

The boss-man is not a technical guy per-se, just the sort who thinks about things and asks those sorts of rhetorical questions that you're almost certain you could refute with a tome on number theory, but lacking that, you're really at a loss to do any otherwise than just give him his way and hope for vindication later.

This latest one may not actually be one of those times ...

It's one of those, one of these, half a dozen of the other scenarios, and it indirectly involves RSS because I think this may be a solution to my much broadcast prognosis on the fate of RSS, and put quite simply and direct, it goes like this:

why not distribute RSS via a 'Listener' pattern?

Why not indeed ...

RSS Observer

Think about it: the RSS use-case is really not suitable to the polling model that is crushing a lot of sites with needless traffic, Conditional-GET notwithstanding -- the correct paradigm of RSS use is Listener/Observer

Remembering HTTP Put

Here's how it works:

  1. You visit a website and see the RSS chicklet; instead of drag and drop into your aggregator (or whatever it is you do) you click it -- and give your aggregator URL to the resulting form page. Maybe your browser could even make aggregator url a local profile option queried by the blog host via javascript.

  2. The host site, which could have many blogs on board, takes note of your aggregator URL and the pattern of the current (referring) page RSS, enters these into the Listeners list

  3. When any blog hosted here updates, an event is triggered to scan the Listeners looking for pattern matches against the current blog RSS link, queueing the list of target URLs.

  4. at the next opportunity (load permitting), the host site spawns HTTP-PUT requests piping the RSS contents out to the listener URLs!

fait accompli -- it's really nothing more than what the blogs are already doing with the Ping calls, just recognizing the reality of the low-cost of big-iron, and how most blogs are housed by server hardware more than capable of a parade of asynchronous ping-forks, ramped up to notify the entire subscriber base. All the benefits of real-time notification, in what is actually real-time, and for perhaps far less cost and bother, placing the burden of technical understanding of load/traffic issues on the already technically adroit server side.

Wonderfully retro

The McLuhanists will recognize the inherent retrieval aspect: This is just as it was back when you subscribed to site update notification with your email, only improved upon by being a low-level XML-based machine to machine transaction. It's the same story as the mailing-list, and similarly lends itself cleanly to cascading distributed channels such as NNTP or Jabber transport -- if the server architect is clever, the solution is eminently scalable by spawning asynchronous external exec system() calls on simple C-language workers to save on the annoying process blocking (that kills PING on MT) and conserve memory (100% reclaimed when the exec completes, so no high-watermark effects); the method can be distributed across a fleet of RSS broadcasting boxes through well-known and proven remote execution systems such as gexec.

And besides, we generally don't subscribe to each individual blog that crosses our daily intake path, we subscribe to other sites who have in turn subscribed to the originals; there is a natural fanning out of the load as we subscribe to a Technorati topic that coallates perhaps hundreds of second-tier sources.

RSS for what it is

In terms of real-world use-cases, I've also found myself using a variety of RSS read-strategies

  • via email -- I have newspipe running an hourly scan of the Technorati search results on my blog URL as a workaround to get Trackback-like notifications of blog-cites, piped also through to my cell-phone for that instant You've been blogged! rush. RSS/PUT eliminates the need for newspipe because I could just subscribe with a mailto: callback.

  • as email/news -- Several feeds such as the Gutenberg new-acquisitions are new content summary alerts that lend themselves to being read by my GNUS nnrss; this presently is limited in use because a check of new email now implies a re-fetch of all the subscribed feeds, some of whom can be slow to arrive due to the size of the micro-content and the load on the remote server.

    RSS/PUT means my new acquisitions list would have arrived quietly when it was published, leaving GNUS with the far faster and simpler task of simply parsing the feed cache file.

  • via merged aggregations -- most of my RSS subscriptions go into a local Drupal aggregator where I produce a summary page by category giving me the latest items in each domain -- I do a similar thing in the sidebars of most of my Drupal-based websites, and this is a popular feature of those sites. In the present state of the art, there's an awkward Drupal cron script that scans the list of subscriptions on the clock-tick, compares to the registered refresh frequency and initiates the fetch on material suspected of expiry; only the actual fetching can tell for sure.

    With RSS/PUT, I simply give the URL of my PUT-handler: RSS/PUT eliminates needless requests, gets my web-host off my back for having this ubiquitous process in his top list, and I'm assured that every single one of my categories contains the very latest up-to-date contents. Since there's a low probability of any two sites updating at the same moment, the incoming load of HTTP PUT items is smeared more smoothly around the clock.

  • webwatching -- the venerable old original use-case of the RSS and the model chose by most aggregators and web-based aggregators, I have some sites that I just like to keep tabs on, much like the F:F:F watching so obsessing the hero of William Gibson's latest, I have these sites feeding directly into my intranet portal sidebars so I know at a glance, when I happen to glance, if someone in The Community has proffered a post.

    With RSS/PUT, I have endless possibilities for customized actions to take given the incoming PUT-handler, voice-synth robotic town crier, ding a gong, hue an office Ishii-orb, whatever.

From the Consumer side of the equation, the push-process of RSS/PUT just makes so much sense, a recapturing of the You've Got Email thrill of the moment, a sense of the presence of the remote and unseen authors who make up such a portion of our modern on-line experience.

What say the preachers?

I posted this idea here on Advogato and not on my personal blog for a reason: I'm hoping someone might have done that missing-quantity numerical analysis which deftly explains to boss-man exactly why RSS/PUT is a dead-end, and why the numbers of signatur subscribers would of necessity exceed the cost/performance rationalities to make this scheme unworkable instead of merely heretical.

Then again, maybe it would work. Sure the number of processes would surge on any update, but the number of processes already surges on the 5-minute marks, and with our current state of the RSS art, 90% or more of those requests are pointless, innocent eagerness to get the Very Latest Thing only to be disappointed in the majority case by a Contents Unchanged header, or worse, yet another copy of an XML file that hasn't changed in the past several hundred requests.

Most blogs, I dare say, have a relatively small count of interested subscribers

"In the future, everyone will be famous to 15 people"
The blog-surveys tell us the average blog is updated only a few times each month yet the aggregators are almost universally set to poll hourly, 7x24 ... and many semi-power users seem to anneal to whatever gets provided as the minimum time-increment -- for these most-general cases at least the Listener/Observer pattern does seem the better choice, and the method is still scalable enough to accommodate even the most trafficked site like a Boing Boing or even the MSN journalists; given dramatic reduction in no-content requesting, who knows, it may even be practical to pursue the dream of total full-content micro-content publishing, the way RSS was meant to be.

So -- what say the preachers? Can anyone definitavely shred this to bits? Am I totally off my keester?

Or is it so crazy, it just might work?

About ATOM diff-feeds, posted 2 Mar 2005 at 16:31 UTC by garym » (Master)

In my earlier posts on the troubles with RSS, more than one comment shot back with the ATOM plans to implement Last-Accessed diff feeds, basically the server checking the timestamp in the header of the incoming RSS request, matching this (in real-time) to the database of blog posts, and returning only those items modified or added since that last request. The scheme is intended to cut down on the needless resend of the whole RSS feed when in fact only the very top Item is necessary

This is another place where RSS/PUT simplifies our communications problems to the point of trivial: Each PUT would contain only the most recent content -- we'd lose the ability to 'catch up' on the past N posts from the blog, but in actual use-cases, at least in my experience, this is outside the scope of RSS; I'm interested in what is happening now and can't think of any real-world case where I wanted to backtrack to some earlier post, a situation doubly compounded given that stat saying most blogs see a post only twice in a month.

Thus the ATOM scheme isn't really a saving, and it complicates the code on the client side as much as on the server side in order to achieve the same result already inherent in RSS/PUT

About Firewalls, posted 2 Mar 2005 at 17:02 UTC by garym » (Master)

Yes, I know, I'm commenting on myself -- consider these as the footnotes ...

Another issue bubbling up as I think about this is the notion that RSS-Pull is perfectly acceptable to most firewall configurations because the Listener initiates the connection, but RSS/PUT violates the inbound traffic rules.

The obvious answer to this one is a complete agreement: Most people probably won't be allowed incoming HTTP-PUT events however the PUT method isn't the only valid URL you could use as a notification address. You could use mailto: to get through even a long and convoluted information path to your desktop, or you could cobble the bridging to allow new protocol extensions to channel the incoming feed via jabber: or icq:.

RSS Relay Service Providers, posted 2 Mar 2005 at 17:36 UTC by garym » (Master)

Just one more, this one inadvertently left out of the main post...

On the Producer/Host side we're asking for no more than what is already provided by the hosts who grant use of a ListMan or Majordomo, and while the notifications may not be guaranteed to be precisely real time, it's close enough for jazz and likely every bit as fast a notice as the average RSS-Reader gets with the default hourly poll, probably faster. What's more, it opens up a whole new industry of RSS-Relay Service Providers, sites that work like a meta-LiveJournal, not hosting your blog, just mirroring your RSS so you can reach an industrial-scale audience; for small operators on bandwidth-limited ISP-granted web allotments, RSS/PUT provides that magical means for true micro-content publishing, moving your bits out to an audience far wider than you might accommodate through your own resources.

Yes, it's a good idea., posted 3 Mar 2005 at 01:28 UTC by ping » (Master)

I agree with you. And so do others — check out KnowNow's event-driven RSS reader.

Consider another protocol, posted 3 Mar 2005 at 03:39 UTC by FarcePest » (Journeyer)

You've already noted the issue with firewalls, plus what happens if the listener is down for awhile? There's already a suitable asynchronous protocol available for this application that is widely-deployed, and it's called SMTP. You allude to this with your mailto: reference, but I think it is, by far, the easiest and most robust thing out there. Just apply some MIME (set your content-type and use an encoding that will be 8-bit safe) and it's just possible that some MUAs may even be able to display it directly, if you reference a suitable stylesheet, although Thunderbird doesn't seem to be able to render it. For a frightful example of this, see Randomly Spewing Spam. Otherwise, use some sort of filter to push it into your aggregator.

Additionally, there already exist some programs to control your subscription list, called "mailing list managers"... Then all your backend (blog software or whatever) has to do is send a single message to the list address.

Sure, SMTP is far from perfect, and Dan Bernstein has even suggested using an alternative that makes senders responsible for message storage, sending short notifications that messages are available, but there is no implementation as of yet, nor does one seem likely any time soon. Spamming problems aside, SMTP works pretty well, since we've had a quarter of a century to tinker with it, and the tools are out there, so why reinvent it?

Why do people care so much about RSS?, posted 3 Mar 2005 at 09:51 UTC by davidw » (Master)

I don't really get all the thought being put into it. What's the big deal? What am I missing?

don't forget about security implications, posted 3 Mar 2005 at 19:24 UTC by jbuck » (Master)

As FracePest writes: "Spamming problems aside ..."

I'm worried about the consequences of doing a PUT, or an email notification, because of the potential for attackers to direct large volumes of traffic at another party (yet another way to do a DoS attack, perhaps).

If the main rationale is to decrease the load on servers, maybe a protocol that resembles NNTP could be used; however, it would be simplified because a given feed would correspond to only one "newsgroup". The client asks for articles newer than a given sequence number, and is given all articles that qualify in RSS format. A client that pings too often would quickly be given a "no articles available" response; servers could be tuned to make this a very fast path.

davidw: we're basically building a new Usenet. I run Sage in Firefox, and by going through RSS I can quickly scan far more information than I could if I individually surfed each site, and I see almost no ads (though I fear that bloggers are going to be doing RSS ads soon).

Protocols and Security, posted 3 Mar 2005 at 22:08 UTC by garym » (Master)

Security is a very good point, but keep in mind that this is subscription based, not, as with email, an open free for all port. HTTP PUT by design allows for HTTP AUTH and can accommodate secure channels on HTTPS, and because you consciously opted in, the process could include registering the Signaller with your local listener -- since we tend to interact with a small number (less than 1000) RSS services, it's not a big deal to have your PUT handler verify the Referrer header against a whitelist of source IPs

I expect the bigger security issue will be the same general ignorance we find today in many corporate IT settings where the thought of allowing a port-handler causes neck-hairs to stand on end; I've encountered these issues before in trying to introduce listener services.

FWIW, I subscribe to the CNW press-release feed and a few others where I already receive deliberate spam, and I've had to opt out of the channels because their Track-back posting model was flooding my aggregators with Hold'em adverts.

There's nothing about RSS/PUT that considers what content gets into your feed, that's a problem best left at the discretion of the the source and listener; if a feed is spammy, you simply unsubscribe, and if it was a trap that won't let you click here to REMOVE (ha ha) then the Referrer is easy to block. Contrast this to Email where the spammer simply changes source IP and tries again.

It is true that opening any inbound port, be it SMTP or NNTP or HTTP PUT, does leave you open to DoS attack. That's a very good point, and one with no easy solution, but a problem that is shared by all port-handlers (and probably why the IT neck-hairs raise at the suggestion!)

But DoS may not be a total show-stopper: In addition to opening a new industry of Feed Reflector Service Providers, we already see online aggregators like Yahoo offering to do your RSS collecting for you, so it's a small step to re-frame these services as personalized web-based RSS-Listeners. In fact, it reduces the load and simplifes the software for these web-aggregator service providers.

and aside to DavidW: What this RSS biz is all about is micro-content publishing, a webservices-based world where your personal website is not the exclusive channel for your creative work, but simply the home-base; what you publish there is relayed to other applications (web, mobile or desktop) where people can experience your work on their own terms.

The earliest and most compelling example was what we saw when Amazon began their Affiliate Program: Any website anywhere suddenly had access to compelling content they could reframe any way they saw fit to please their own local audience, and in exchange Amazon could reach into corners of the market who might be otherwise excluded -- not everyone enjoys a cluttered 3-column ad-laden everything under the sun e-commerce web portal :) For a personal example, years ago I started a simple and highly focussed online 'bookstore' to target mental-healthcare professionals with my own mini-reviews and bibliographies relating to the Japanese 'Quiet' Therapies (check the back of most Psych texts, count the number of non-white 'famous' psychologists ;) -- using Amazon's microcontent publishing in their primative 'pull' process, I could offer direct links to the commercial publications, complete with prices as current as the most recent pull-run, and my site was well-received for many years until it was overtaken by other more capable services.

Amazon later introduced the 'webservices' model, but this was still a polling process -- the software I was using would only poll after any request just beyond my specified update interval, but that was caching code I had to write and, as you can imagine, this is not a fast-paced topic where prices fluctuate or wild sale-promotions dominate; a better model would be if Amazon simply sent me the updates when and if they happened. RSS/PUT would solve that.

I had named this RSS-Push in my early drafts of the main article, then opted for RSS/PUT as being more like some of the other RSS-based innovations like RSS/NNTP (or is it NNTP/RSS?) -- I may regret that as my own experiments very quickly realized how HTTP-PUT was only one of the many possible protocols. You can probably also tell that I'm not a marketing geek and have no nose whatsoever for snappy product names ;)

For the moment I'm sticking with it simply because my initial implementation, a sports-stats live-feed system for, uses libcurl where the actual option CURLOPT_UPLOAD has different low-level meanings depending on the protocol specified in the URL -- barely two days live in the field with version 0.1 and already we're contemplating replacing some of our legacy scripted-FTP mirroring by simply using an ftp://... target.

RSS MUA, posted 3 Mar 2005 at 22:25 UTC by garym » (Master)

FarcePest writes:

... it's just possible that some MUAs may even be able to display it directly ...

I do mention in my article that I use the GNUS nnrss inherent in the popular Emacs news/email reader; it's one of the countless really cannot live without features of GNUS that has kept me vendor-locked into that free software for almost 20 years (and the primary reason I have no experience with Thunderbird &c).

The difficulty with GNUS, though, is that it models RSS as just another POP/IMAP sort of service, to be polled when I press the Get New Mail button, and feeds like Technorati proved too painfully slow to allow subscribing to more than a handful.

I addressed this problem with newspipe, a polling RSS-to-email gateway that I run on a spare box down in the basement; this created an RSS/PUT simulation (via SMTP) where I could even selectively forward item summaries to my cellphone, and that's what got me thinking about technologies for retrieving the lost sense of immediacy and presence with the current state of the art -- newspipe is only as immediate as the last polling run, but when my phone chirps, I can still imagine that I've just, that very instant, been blogged.

KISS, posted 4 Mar 2005 at 02:23 UTC by eckes » (Master)

Web Servers scale very good. They probably dont even notice the polling traffic. On the other hand pushing is pretty complicated. You need to keep state, need to handle dynamic client IPs, you need to habe a active process running on the server.

This is so complicated and bloated. You would need to have leases, handle retries, secure about false subscribes and so on.

This is just too complicated to be worth for 99% of all news feeds.

Use aggregator portals if you care about the traffic.

Greetings Bernd

KISS is exactly it, posted 4 Mar 2005 at 04:29 UTC by garym » (Master)

Bernd, I don't know what you envison, but my initial implementation has none of what you posit as requirements. There is no state, only a trigger on any new content posting, this trigger then spawns a quiet and low priority thread to scan the small list of subscribers; when each gets a hit, I fork a very tiny C-language PUT client giving it the new item content and the target PUT handler URL. Done, finished.

On the subscriber's side, there does need to be a port-hander, xinit.d or whatever, but this is the same situation for Email, Jabber/ICQ or any other port handler, we only add one more, and I've already explained that this could be on a Web-based proxy that the firewalled can visit with their web-browser. The big gain being that everytime you visit MyYahoo, it has guaranteed fresh content, up to the second.

As for scaling your webserver 'easily', I can see that you don't run an RSS-feed host, or if you do, perhaps not a busy one -- I didn't get all that coverage in Wired because I was worried, I got it because I'd exhaust my bandwidth half-way through the working day, and RSS-strain, which scales as a large-multiple factor of subscribers, was the clear culprit.

I don't buy the load argument, posted 5 Mar 2005 at 05:21 UTC by jbuck » (Master)

Yes, the way RSS works now it's too expensive; a typical setup will give each visitor the last N articles, no matter how often they poll. But the fact remains that the minimum level of work is proportional to the number of subscribers times the number of articles published, whether pull or push is used.

It seems that the key (again whether pull or push is used) is to avoid delivering an article multiple times, and to avoid a significant cost for too-frequent polling or for pushing articles that no recipient is around to read. Sure, you can push "guaranteed fresh content, up to the second" from machine A to machine B, but if machine B's only user is in bed, so what?

I envision a server that can keep a sequence number for each feed it provides in memory. If a client polls and there are no new articles, a response can instantly be delivered saying "no new articles" without hitting the disk. If there are, in fact, new articles, only those new articles are delivered.

Now, we can have ill-behaved clients that poll every second, but that kind of anti-social behavior can be detected and throttled.

Some notes, posted 7 Mar 2005 at 15:27 UTC by Malx » (Journeyer) will tell about RSS pub/sub (jabber).

RSS technology is not limiting you to GET only - you free to implement public agregates - is an exelent example of such an agregate.

So you should consider these questions:

  • Need you get all of notifications or only part of them? (for the second is choosen - you recive notifications only when you online without offline one. this is actually great!)
  • Need you put inside of RSS full news or just subject and link to source? (It depends on creator of content and advertizements)
  • Are you need full control on process of implementation? ( you can't control agregators, but you have full control of own website and RSS )
  • How you pay for traffic? (it could be so you will have to pay more if you PUT thouse info then if you passivly accept GETs.
  • Antiflood (user should have control on how much of news he will get. And REMOVE is not a good way of controlling).

BTW in early Netscape 4 there was PUSH technology for content delivery and history proove it to be bad :) It have not survive.

So I think RSS is just great thing if you are not limited to RSS only. You should implement e-mail subscribes, pub/sub, XML-RPC etc in addition to RSS on your site. But RSS is great becouse of simplicity and control.

With RSS, the killer is the factor, posted 7 Mar 2005 at 21:05 UTC by garym » (Master)

JBuck writes

"... the minimum level of work is proportional to the number of subscribers times the number of articles published, whether pull or push is used."

This may be the minimum, but my server logs show that it is not the typical -- most RSS readers either implement the RFC definition of Conditional-GET incorrectly, or they flush their ETag/Last-Modified database each time the software loads, so the empirical result I see across all those blogs where I have access to logs shows 90% of the requests come from broken readers who are downloading the full RSS feed every time, making the load a product of the number of subscribers times the polling intervals; the load becomes the size of the feed times S*24 for the well-behaved default-setting majority, and times S*24*12 for those 20% of semi-poweruser keeners on 5-minute schedules, with only maybe 10% of those leveraging Conditional-GET to only hit me for a few headers.

Pub/Sub is another implementation of what I have done with HTTP-PUT except that it requires Jabber and the only existing edition is implemented as a polling aggregator which then pushes out whatever it found new on the last poll run.

About Conditional-GET, posted 7 Mar 2005 at 21:09 UTC by garym » (Master)

I should clarify also that Apache is also broken with respect to Conditional-GET, and that I implemented a work-around to the above load problems by similarly breaking my Conditional-GET -- the spec says the Last-Modified date comparison should be exact, not relational (== and not <) to allow for reverting old content, but the aggregators and Apache both take the lenient if-less-than approach; if you have a blog that keeps RSS as static flat-files, Apache will reduce your load for those aggregators that give their own server time as a reference instead of the time you gave them from your server on their last request.

Re: About Conditional-GET, posted 14 Mar 2005 at 05:14 UTC by jamesh » (Master)

Are you sure that the server should be checking for equality of the Last-Modified date? The spec seems to say otherwise with respect to the If-Modifed-Since header:

c) If the variant has not been modified since a valid If-
   Modified-Since date, the server SHOULD return a 304 (Not
   Modified) response.

It does go on to say that some servers do not fully implement If-Modified-Since, and only send the 304 response if the dates match exactly (so it recommends that clients send back the Last-Modified date received earlier from the server). So Apache seems to be acting correctly.

If you want to handle the case of reverting a document to an older version (and getting an older Last-Modified date), then you should be sending an ETag with the document. The older version would have a different ETag, so if the client sends a If-None-Match header, it will get the correct version even if the If-Modified-Since header on its own would cause a 304 response.

You want RSS/Notify, not RSS/PUT, posted 14 Mar 2005 at 05:39 UTC by apenwarr » (Master)

This is actually all rather simple:

The problem with RSS/GET is that people poll your server over and over trying to see if anything is new, and it's not. Wasteful.

The problem with RSS/PUT is that you send notifications to people who may not care and may, in fact, have forgotten they subscribed and never read your data again. Or annoying people might subscribe others just to bother them. Also wasteful, but differently so.

Depending on load characteristics, one will be more wasteful than the other, but it depends which one.

So here's my suggestion: RSS/Notify. You sign up your server (which could be a publicly-provided cache server, or one provided by your ISP, or whatever; obviously not behind a firewall) for notifications with your favourite RSS feeds. When those feeds get updated, they notify your server about it - once. They just say *that* it changed, not *what* changed. This marks the cache on your server as expired, so next time you *do* want the information, assuming you do, you know that you have to download from the original source. When you do, you re-subscribe so you get the *next* notification.

Advantages of this method:

- if someone updates their RSS feed 15 times a day, and you only read once a day, you're only downloading the same data once, not 15 times. (It saves breaking the RSS into parts, which is complicated and error-prone.)

- if someone updates their RSS feed once a month, you only bother the server once a month.

- if you forget to unsubscribe, a particular server will not have the extra load after the first time. And you won't send the data to someone who doesn't want it even once.

- cache servers that don't actually want your feed (ie. weren't expecting the notification) will just ignore it, so you can't possibly spam them.

- instant notification of all changes (or rather, as often as you make your desktop application poll your cache server, which can be very often).

More KISS, posted 19 Mar 2005 at 06:29 UTC by eckes » (Master)

garym: I dont really understand? You talk about

I got it because I'd exhaust my bandwidth half-way through the working day, and RSS-strain, which scales as a large-multiple factor of subscribers, was the clear culprit.

and you say:

this trigger then spawns a quiet and low priority thread to scan the small list of subscribers; when each gets a hit, I fork a very tiny C-language PUT client giving it the new item content and the target PUT handler URL.

How does that match? In order to make your simple solutio scale, you would need keep a database of subscriptions (state), which has expiring entries (leased). Your low prio thread will have to deal with unreachable destinations. You will have to do the DNS lookups. If you notify in a single thread it will take ours to contact all subscribers. Do you retry if the notification does not get through? (if not, your clients will still poll).

If, however you just notify some distribution servers (aggregators), then this is eactly what XMLRPC Pings are used in all recent Blogging Software. They will notify sites like technorati or about new articles. That way those sites are fresh to the second. This is an alternative distribution channel.

I still think the Blogsphere has grown so successfull exactly because RSS feeds are easy to setup and maintain, they are robust and scale good. If your RSS is created based on ETag and Last-Modified you can limit the traffic to a minimum. Aggregator sites who harvest a feed and syndicate it in user specific lists creatly reduce the load and can be updated with pings.

A client which is visiting your site will most likely retrieve 1-5 html documents with 1-30 additional files (graphics, stylesheets). If this client is visting your site every day once, he requests 1-150 files. If the same client is polling every 30 minutes, he will request 50 times the same file (and get a not modified answer). This is a neglectable load for sure.

Greetings Bernd

Because the aggregators are not well-behaved, posted 30 Mar 2005 at 22:47 UTC by garym » (Master)

Bernd, everything you say is true, but only in that imaginary landscape where everyone plays by the rules. In the real world, even the Radio Userland aggregators largely ignore your ETAG/Last-Modified headers, clearing their db every time they are loaded, using their own relative local time instead of the definitive and precise key given by the servers as per the RFC. As for retries and audit trails, you're right, I don't. No need. Like notifying, one message suits all recipients, and if they miss it, they miss it. I send out the notice, if it goes, it goes, if it doesn't, I pause the thread three times for a few minutes each run, then give up (logging the event) -- with current RSS, if you poll me and I'm down, you miss out too, so it's no worse -- in my production version, if you missed a notice, you can revert to a poll for your own personal "missed" list; clients check this file a few times a day. We're not talking about bank-transaction apps here.

As with Backtrack and Comments and their defenses against abuse (ie total lack thereof) the blogosphere grew because it existed in a colleagular space, in the company of trusted friends, and no, it doesn't scale, not in any real sense of the world. Machine to machine among friends perhaps, but not in the same sense needed by someone like MSN, CNN or Yahoo, out among the faceless unwashed masses.

apenwarr, RSS/Notify is not a bad idea; the PUT file could be like OPML, a summary of blogs and their items, but your method does require a specialized reader to receive the Notify and act upon the contents, but that does not mean it's a bad idea. Worth pursuing if you have the time, especially for larger content such as podcasts and the emerging world of PSP-casting. Redundant fetching was fine and dandy when it was just text and maybe a few images, but do you really want to fetch a 10M videoblog post for the next 10 times until it rolls off the bottom of the RSS?

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page