Mark Gibbs ponders how to analyze Twitter for a specific search term using the Twitter search API and Microsoft Excel.
Excel is one of those programs with so much depth that there are whole areas many of us will never come to grips with. One such area is Excel's support for XML through the program's XML maps feature. XML maps are a powerful tool if you can figure out how they work so this week we're going to do exactly that. Or at least, something like that.
Let's create a problem: You've been asked by the CEO (which means you've been commanded) to find out whether people are talking about the company's new product on Twitter. The CEO wants to see daily "visibility" reports because he's hoping he'll get a sense of how effective the PR campaigns are.
So, first of all let's check out the Twitter API documentation. Twitter offers a whole galaxy of API functionality but the interface that allows us to grab the public timeline only samples the last 20 Tweets (Twitter messages), which is hardly a representative sample. In fact, to get direct access to the complete public timeline we'd have to make special arrangements with Twitter management, so let's look for a different strategy.
Twitter actually provides an API that can simplify our problem: The Twitter Search API lets you create a search and generate an RSS feed for a specific search.
<digression>I must briefly applaud WowWee for the Rovio. This is an amazing toy, er, robotics experimentation platform with a full and rich API and a ton of slick technology. You might ‘need' one in your office. The Rovio gets a 5 out of 5!</digression>
If we go to the Twitter Search service we can try a search for "rovio" and, using the advanced search, ask for the date range from Feb. 1 to March 24 (the logic for this is that I'm writing this on March 24).
The result of this search will be, by default, a list of the last 15 items. If you change this in the advanced search dialog to, say, the maximum of 50 (which is odd as the documentation says that the maximum is actually 100) the new default will be defined by a cookie setting and this leads to a problem.
While the resultant RSS feed will list the correct number of items when we access it from our browser (the cookie will be returned defining the number of items), we need a real search URL that can run from any process without needing the cookie data. This requires we modify the search URL from: http://search.twitter.com/search.atom?q=+rovio+since%3A2009-02-01+until%3A2009-03-24 (to:) http://search.twitter.com/search.atom?q=rovio&since%3A2009-02-01&until%3A2009-03-24&rpp=50&page=1
I replaced all of the "+" signs separating the arguments (which do not conform to URL encoding standards) with ampersands ("&" – the more usual argument separator for HTTP requests), added the results per page (rpp), and added a page argument of 1.
Before you ask, no, I have no idea why I couldn't have appended "+rpp=50" in the Twitter search URL, but there we have it – the wonders of Twitter's documentation (or rather lack of it) and the search interface.
Anyway, now we have a way of requesting an Atom formatted RSS feed and we're ready to access and analyze the data using Excel … which we'll start on next week.