Skip Links

Analyzing online content with OpenAmplify

Extracting information from content not as easy for computers.

Web Applications Alert By Mark Gibbs, Network World
June 30, 2009 12:01 AM ET
Gibbs
Sign up for this newsletter now!

Mark Gibbs' Web site tips, plus network applications news headlines

  • Print

Making sense of Web content is mostly easy for humans but rarely easy for computers. Part of the issue is that recognizing the "interesting" parts of online content involves what is mostly unstructured data, making the task very difficult.

For example, consider text that you, a human, have no problem understanding: "The man, who is 42, was charged with arson". You immediately understand what the age of the arsonist is. Not so for computer programs which would need a huge number of rules to interpret that sentence and discover the man's age. Even then anyone of the thousands of possible minor variants such as "The man, 42, was charged with arson" would most likely cause the same program to make a mistake.

And those aren't even tricky sentence constructions. Consider a sentence like "You'll know the melon is ripe when you can smell it walking into your kitchen" -- it would take a truly stupendous program for it to have a "deep" understanding of the meaning.

That's the problem the concept of the Semantic Web is intended to address: Adding implicit structure to Web content so that its meaning and intent are clear and easily discovered. Of course, to date only a small percentage of Web content is architected that way and even then, casual communications between people will always be unstructured.

And this brings me to today's topic: How can we programmatically interpret online content to, for example, determine the prevailing sentiment of Twitter users who mention "iran"? The answer is to turn to linguistics.

A new service called OpenAmplify published by Hapax LLC uses a "patented Natural Language Processing technology" which analyzes every word used in a piece of text to identify the "significant topics, brands, people, perspectives, emotions, actions and timescales". The requests are via a RESTful API output is in XML, Doubleclick DART, or JSON structure, perfect for programmatic analysis.

The output enumerates "signals", structured representations of the meaning, intent, style and other characteristics of the text that are weighted, ranked and organized.

OpenAmplify provides four different analyses that can be requested separately or as a group. The four analyses include topical signals, which include polarity (positive/negative perception of each topic) and guidance (degree to which guidance is sought or offered about each topic) as well as listing proper names and referenced locations; action signals with a measure of decisiveness (how likely the action is to be taken), guidance (whether guidance is sought or offered on taking the action) and temporality (when the action may take place); stylistic signals indicating flamboyancy (a measure of how "flowery" the writing style is) and use of slang (degree to which slang vocabulary is used); and finally, demographic analysis covering the likely age, gender and education level signals of the text's author or audience.

The service is free for up to 1,000 transactions per day (note that a single request may involve more than one transaction) and commercial terms are available for volume users.

Mark Gibbs is a consultant, author, journalist, columnist and blogger.

  • Print
What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?

Videos

rssRss Feed