Skip Links

RSS technology, take 2

By , Network World
May 14, 2004 03:58 PM ET

Network World - So where were we? Oh yes, the Really Simple Syndication system - last week was a veritable banquet of RSS featuring a smorgasbord of standards, a panoply of products and other alluring alliterations.

We broke off in the middle of discussing how a news aggregator with the whimsical name of Syndirella goes about reducing the bandwidth it uses when downloading news feeds.

The reason that this matters, as we pointed out, is that should 20,000 people download a 50K-byte RSS file from some lucky site once an hour, it would require 1.2G bytes of data transfer every day. If the feed were updated only twice per day, this would be a profligate, unforgivable and rather expensive waste of bits.

The answer is simple yet subtle, profound yet passé, logical yet laughably geeky. The answer is Conditional GET, an HTTP feature that can significantly reduce the total transfer volume by telling you whether the content you request has changed.

Conditional GET is implemented as two fields in the response header: Last-Modified and ETag. What matters is whether these fields have changed since you last looked at them rather than what their values actually are.

To use these when you request content from the server, you include two fields in the HTTP request header. First there's an If-Modified-Since field containing the value from the Last-Modified header you received (or 0 if you have never retrieved the feed before). Second, there's an If-None-Match header field with the value from the ETag header (or 0 if never before retrieved).

If the content has changed (that is, the RSS file has been updated since you last downloaded it), the server will respond by sending you the new RSS file's content.

On the other hand, if the content has not been changed, the server will respond with a 304 code, which means "Not Modified," and the body of the reply will be empty (some examples).

Now why would you use the value from the Last-Modified and ETag fields rather than your own local date and time? You guessed it. The chances of your local clock being exactly synchronized with the remote Web server are as close to zero as are your chances of winning the state lottery without buying a ticket, so you could expect to always get the content returned.

And when we're considering RSS feeds and Last-Modified and ETag field dates, we have to be aware that their values may have absolutely nothing to do with any time stamp that the server might generate - for example, the Apache server uses a hash of the contents of the file.

Anyway, now that optimization is out of the way, what about that feature of Syndirella that lets regular Web pages be treated as if they were RSS content? The way it works is Syndirella parses that HTML and pays attention to the tags you tell it have meaning. For example, you might specify the tag <span class="title"> ...</span> and <div class="body">...</div> that define the title and content for each feed item.

So Syndirella can turn a sow's ear into a silk purse. But how can we create silk purses out of non-RSS content generated by some program for consumption by a news aggregator that can't deal with sows' ears?

Our Commenting Policies
Latest News
rssRss Feed
View more Latest News