Microsoft Subnet An independent Microsoft community View more

Cha-ching of Scraping: Data Brokers Digging Up & Selling Your Digital Dirt

Data Brokers scrape password-protected private forums, trawl the Net for digital dirt, listen in, and profit from your personal data. Now one wants to tie your real name to your online alias.

The default decision for digging up digital dirt on a person seems to be that if it's online, then it's fair game. Did you know, however, that some companies which scrape and sell your personal data, will disregard any ethical concerns and use automated software to log into private message boards and scrape your info? There are many personal and private subjects that people online may share with likeminded users, such as physical or mental health topics.

The Wall Street Journal reported that site scraping happens all the time, ranging from free do-it-yourself scraping software to screen-scrapers that charge "between $1,500 and $10,000 for most jobs." The website PatientsLikeMe.com discovered media-research Nielsen Co. was scraping all messages off the private online forum, messages that were "supposed to be viewable only by members who have agreed not to scrape, and not by intruders such as Nielsen." Forums on PatientsLikeMe include topics such as "AIDS, supranuclear palsy, depression, organ transplants, post-traumatic stress disorder and self-mutilation."

Many people might think that health data from password-protected private forums would not be included in sold behavior and personal information, but a Nielsen spokesman confirmed the company's reports include information gleaned from the Internet, "so if someone decides to share personally identifiable information, it could be included."

What it comes down to is that if you post online, your data can be sold or used to find connections to people and events. Although the company Intelliseek decided it was ethically wrong to scrape private message boards, Nielsen bought Intelliseek in 2006. Also interesting, is that in 2001, the CIA's In-Q-Tel invested in Intelliseek. In 2009, the CIA investment arm In-Q-Tel and Google Ventures provided funding for Recorded Future, a predictive analysis company that trawls over half a million websites, Twitter feeds, YouTube, and blog posts, looking for connections between people, groups, and events. What a small world it turns out to be where invading privacy and selling peronsal information is concerned.

BlueKai, the "world's largest data exchange," advertises that before it came along; "we lived in the Data Dark Ages." True or not, if it wants to claim credit, then thanks, BlueKai. Now it's big business to track people's online activities and then sell those personal and behavior interests, just as it is a booming business for companies to hire others to track what is being said about them or their brand. While NASA scientists are dealing with the Supreme Court over the government's need to conduct background checks for "low-risk" work, human resources at many companies "Google" or otherwise engage in social media identity and background checks to find potential candidates.

The data broker Spokeo claimed its services could provide financial data and credit ratings to help in making employment decisions. Forbes reported on another creepy service, Social Intelligence Corp., "taking the traditional background checks that are commonly used by corporate human resource departments to look for things like criminal records and moving them online to track social media networks, including Facebook, Twitter, Flickr, Youtube, LinkedIn, and individual blogs."

Another data collector and seller is PeekYou. It calls itself "a new kind of online white pages." PeekYou's privacy policy claims that "PeekYou acknowledges the concerns that you may have about the use of information about you. Out of respect for those concerns, we maintain strict policies to ensure the privacy and security of your personal information that we may collect through your participation on our Site.... We make every effort to preserve user privacy."

However if the company highly regards privacy, then why has PeekYou applied for a patent to scrape the Net and then attach real names to online aliases? The WSJ stated, that PeekYou's patent includes, among other things, a method to match "people's real names to the pseudonyms they use on blogs, Twitter and other social networks." Mashable endorsed, "PeekYou makes people search worthwhile." Here's a peek at part of PeekYou's patent.

Most people know about cookies, trackers and their data being sold for targeted marketing. For example, WSJ found 115 tracking files on Microsoft's live.com. Of those: 81 don't let users opt out, 30 may share information, 25 may collect financial and health data, and 61 may keep your information indefinitely. Although you might not like your personal data to be collected and sold, it seems likely that most people know the risks associated with social networks that allow anyone to see any public comments that are made on the site. If you post an online resume, then it will usually have your name, address, email and phone number. That is not quite the same as posting under an alias not tied to your personal data or posting on a private password-protected site, saying something sensitive like health details. But, sadly, if it's online, scrapers will dig for any digital dirt that they can connect to you.

To PeekYou and other data brokers who dig up and sell digital dirt or personal information, I'm curious how ethical you would find it to conduct that search in real and not cyber life? Would you camp outside a person's home to record their comings and goings, friends, family and any other visitors? Would you snap pictures and use GPS mapping?  Would you follow them so you could listen in on their conversations? Would you dig through their garbage to see what you can find out about them, their patterns, behaviors, finances or other personal preferences? Or would that seem unethical? Would you want someone to do it to you?

The Privacy Rights Clearinghouse lists online data brokers and scrapers. We clearly need all the help we can get to protect our privacy in this world of drip, drip leaking and location-aware online apps, scraping of personal data, and tapping on the wire. Can you hear the cha-ching of profit being made off your data?

Like this? Check out these other posts:

Follow me on Twitter @PrivacyFanatic

Editors' Picks
Join the discussion
Be the first to comment on this article. Our Commenting Policies