Some Google observers are concerned that a new privacy policy announced by the Web search giant may contain holes that could make it possible to connect search logs to the names of users, potentially defeating the purpose of Google’s plan to make records about user searches anonymous after 18 to 24 months.
Google will alter cookie information and change the last eight bits of the 32-bit IP addresses that identify computers logged onto the company’s search engine, under a policy announced last week. This means there is only a “partial de-identification” of users, says Pam Dixon, founder and executive director of the nonprofit World Privacy Forum.
“If there was a data breach and it all got out, you wouldn’t get the entire IP address. That’s a step,” she says. “But if you were involved in a legal process and wanted to re-identify the data, it can be done. … This is not a cloak of privacy that has been put over user searches.”
According to a statement released by Google Tuesday, someone with access to an IP address in which the last eight bits are obscured could narrow the address’s location down to a group of 256 computers, but would not be able to figure out which of those computers the IP address belongs to.
Privacy advocates have focused on Google and other search engines because the phrases people search for provide insight into their personal histories, including diseases they might have. Google says it keeps search logs to analyze usage patterns and diagnose system problems. Privacy advocates worry that keeping archived records of searches in storage for extended periods of time opens the door for law enforcement agencies to demand information that could identify users.
A second concern about Google’s new policy was raised in a blog posting by Forrester Research security analyst Jen Albornoz Mulligan. If an anonymous IP address is always connected to the same user computer, the user could be identified because people tend to search for their own names on Google, she argues. This was the strategy AOL was using to anonymize IP addresses last year when the company accidentally released a database that contained search histories of more than 650,000 AOL users, Mulligan says.
Google has not yet decided exactly how it will go about changing the last eight bits of IP addresses. “We’re still developing the precise technical methods and approach to this,” the company states on its Web site.
Mulligan notes in her blog that Google officials “make no mention of preventing a similar AOL disclosure snafu by ensuring that individual searches by the same person are anonymized in different ways.”
Chances are, though, that Google will not make the same mistake as AOL, Mulligan says.
“I would certainly guess they are smart enough to not do this,” Mulligan said in a phone interview. “I would have imagined they would learn the lessons from AOL.”
Google’s policy is designed in part to comply with a European Union directive requiring data retention for between six months and two years. The exact length requirements will be determined individually be each European country.