Sleep and other patterns pinpoint individuals in datasets, study finds

Simple behavior, tracked by geotagged posts on social media apps, is enough to spot individuals in a dataset

A human’s “real-world movements” are so unique that people can be distinguished by their patterns, a new study conducted by Columbia University and Google finds. And that’s even if the datasets are anonymized.

Sleep cycles captured by fitness IoT products, commuting schedules stored by bots, the days of the week that one goes to work and other habits could all one day be used to discern one person from another, the study says.

What’s more, the computer scientists say all you need is one dataset to obtain results, for example, a few bank card transactions.

And the researchers reckon that if you start adding more data sets, you can get even better results. They say location tracking is one of the keys to figuring out who is the same person. In that case, geotagged posts on just a couple of social media apps is all that’s needed to associate accounts held by an individual.

The secret is correlating the datasets. For example, shoppers can be picked out by combining anonymous bank card purchases with cellphone logs in a geographic area. With the locations, such as the cell tower logs, and the credit card purchases, you can figure individuals accurately, the scientists say.

However, using just one dataset works, too.

Scientists performed the pinpointing in experiments using an algorithm.

It works by calculating the probability that one person posting at a given time and place could also be posting in a second app, at another time and place,” the article says.

Tracking human behaviors

One day that could be extended to other patterns, such as sleep. People who have exact, regular, non-sleep deprived habits—such as those that may become more normal as future robots take over our work roles—could conceivably be precisely picked out.

For example, regularly waking up at 7 a.m. and pressing a nearby IoT light switch a moment later provides data points that may very well be unique within an apartment building served by the same microcell-driven internet connection. That’s you and only you, even if the dataset is stripped of any identifying name.

“Location datasets are a particularly fruitful domain to study,” the researchers say in their paper (PDF), “especially when linked to other datasets.”

The team says their algorithm “leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains.” In other words, the digital traces left behind by individuals can be blended and amalgamated.

But learning user behavior to make recommendations, such as product recommendations based on geotagging, isn’t inherently bad, they say. The problem arises when the user doesn’t know it is being done, such as by a marketing firm they have no association with.

I wrote about how mobile networks harvest user data, anonymize it and then sell it on to marketers in “How wireless providers are quietly cashing in on your location data.”

“The idea that these traces can all be merged and connected is both fascinating and unsettling,” the researchers say.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10