In the age of Big Data one of the things we all do over and over again is mine the Web for content we need to repurpose. While finding the stuff you want can be a big job, trying to get that same content into a usable and useful format all too often becomes a Herculean task.
If this sounds like one of the monkeys on your back there’s a solution you should check out. Called import.io, this application is designed to analyze Web pages, discover their structure, and extract the data items you want into a tabular or other structured format.
import.io can operate in “Magic” mode where you point it at a URL and it slices and dices the content to produce a table automatically. For example, here’s a search on the Ikea website for “chair” (note that Ikea’s search engine results include things that aren’t remotely like chairs for no obvious reason):
… and here are the results from import.io:
That's impressive! You can download the table in CSV or convert the process into a REST API that can be called when needed, for example, to update a database. The "Magic Api" page also provides options for re-running the query and downloading the results in JSON or tab-separated variable format, running the query with a list of URLs (“Bulk Extraction”), showing how to use the API as an HTTP GET or POST request, push the results to Google Sheets, graph the data with plot.ly, or provides lots of instructions on integrating import.io with other applications and languages.
import.io also provides other, more complex extraction, web crawling, and data connecting services which work great but are surprisingly tricky to figure out how they're used; the company needs to take a long, hard look at the user experience design because it is way more opaque than it should be. That said, you can do some amazing things with Web site extraction once you get to grips with the user interface.
The app, which is based on a web browser interface, is actually a mix of locally run functions along with services executed remotely on import.io’s servers which are displayed in the app as app content. This is why when you’re running the app and, for example, in the Magic mode you click on the “thumbs down” icon to indicate that the data extracted isn’t what you want, the app displays “Looks like you’re going to need our Desktop App” which is really confusing and makes no sense (surely someone at the company must have noticed this?).
Those complaints aside, this is a powerful and hugely valuable app particularly as it is, surprisingly, completely free (for enterprise scale use you should talk to the company)!
Check out the import.io demos, they’re impressive. If the data extraction monkey is part of your corporate circus this app could save you a lot of pain and effort.
Based on comments by unnamed sources, an article about Avaya weighing bankruptcy has triggered a...
IBM says common Session Initiation Protocol (SIP) and SIP and Cisco Skinny Client Control Protocol...
In 2010, Jim Gettys, a veteran computer programmer who currently works at Google, was at home uploading...
Here's what data each telemetry level collects and the price you pay to send the least telemetry to...
A working group with representatives from some of the top companies in the world – GE, FedEX, Bank of...
Paying respects to computing pioneers, corporate leaders (AT&T, Intel) and the most inventive of...
Microsoft’s hyperscale data center in Quincy, Wash., shows how far cloud data centers have come in a...