RSS technology, take 2

So where were we? Oh yes, the Really Simple Syndication system - last week was a veritable banquet of RSS featuring a smorgasbord of standards, a panoply of products and other alluring alliterations.

We broke off in the middle of discussing how a news aggregator with the whimsical name of Syndirella goes about reducing the bandwidth it uses when downloading news feeds.

The reason that this matters, as we pointed out, is that should 20,000 people download a 50K-byte RSS file from some lucky site once an hour, it would require 1.2G bytes of data transfer every day. If the feed were updated only twice per day, this would be a profligate, unforgivable and rather expensive waste of bits.

The answer is simple yet subtle, profound yet passé, logical yet laughably geeky. The answer is Conditional GET, an HTTP feature that can significantly reduce the total transfer volume by telling you whether the content you request has changed.

Conditional GET is implemented as two fields in the response header: Last-Modified and ETag. What matters is whether these fields have changed since you last looked at them rather than what their values actually are.

To use these when you request content from the server, you include two fields in the HTTP request header. First there's an If-Modified-Since field containing the value from the Last-Modified header you received (or 0 if you have never retrieved the feed before). Second, there's an If-None-Match header field with the value from the ETag header (or 0 if never before retrieved).

If the content has changed (that is, the RSS file has been updated since you last downloaded it), the server will respond by sending you the new RSS file's content.

On the other hand, if the content has not been changed, the server will respond with a 304 code, which means "Not Modified," and the body of the reply will be empty (some examples).

Now why would you use the value from the Last-Modified and ETag fields rather than your own local date and time? You guessed it. The chances of your local clock being exactly synchronized with the remote Web server are as close to zero as are your chances of winning the state lottery without buying a ticket, so you could expect to always get the content returned.

And when we're considering RSS feeds and Last-Modified and ETag field dates, we have to be aware that their values may have absolutely nothing to do with any time stamp that the server might generate - for example, the Apache server uses a hash of the contents of the file.

Anyway, now that optimization is out of the way, what about that feature of Syndirella that lets regular Web pages be treated as if they were RSS content? The way it works is Syndirella parses that HTML and pays attention to the tags you tell it have meaning. For example, you might specify the tag <span class="title"> ...</span> and <div class="body">...</div> that define the title and content for each feed item.So Syndirella can turn a sow's ear into a silk purse. But how can we create silk purses out of non-RSS content generated by some program for consumption by a news aggregator that can't deal with sows' ears?

Here's a neat idea on that theme: A free PHP script that checks a POP3, Internet Message Access Protocol (IMAP) or NNTP mailbox on demand and returns an RSS feed containing the messages in the mailbox. Called MailFeed, the script produces standards-compliant RSS 2.0 XML and requires PHP 4.3.4+ with the PHP IMAP extension and the Mail_Mime PEAR package (included by default with most PHP installs).

And then there are services to do it for you: Check out RSSgenr8, an HTML-to-RSS converter. You just modify the HTML on your site to include the tags ... around items to be listed in the feed.

RSSgenr8 takes the Web page title as the channel; the page's meta description as the channel description; the item text as the description element; and the first line or first 100 characters of text (any HTML coding is stripped out) as the title element.

To create the feed you paste the target URL in a Web form on the project's home page and submit it; call the back-end PHP script directly (both of these services are free); or download the free script and run it on your own server.

Next week we'll wrap up RSS. Headlines to gearhead@gibbs.com.

Learn more about this topic

Network World Fusion's RSS and OPML feeds

More than 60 topic and company-specific feeds.

Join the discussion
Be the first to comment on this article. Our Commenting Policies