Predicting crime with Big Data ... welcome to "Minority Report" for real

Big Data can reveal amazing insights and now includes predictions on where crime is going to happen


Crime has patterns just like everything else humans do when we're viewed as a large enough group. Thus, while individual behavior can be hard to predict, determining the average behavior of a population and then matching individuals to that template to determine “fit” can be surprisingly accurate.

This is the world of predictive analytics; the scientific version of a crystal ball. Instead of peering into a glass globe you peer into (ideally) massive amounts of data and using Big Data mining techniques such as statistics, modeling, and machine learning you look for patterns that are indicative of current or future behavior.

Predictive analytics has become very sexy over the last few years and has produced some impressive insights into human behavior and, occasionally, problematic revelations (see footnote below).

A recently published paper titled Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data by Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Nuria Oliver, Fabio Pianesi, and Alex Pentland discusses the use of mobile phone data and demographic data to predict crime geographically:

The main contribution of the proposed approach lies in using aggregated and anonymized human behavioral data derived from mobile network activity to tackle the crime prediction problem. While previous research reports have used either background historical knowledge or offenders' profiling, our findings support the hypothesis that aggregated human behavioral data captured from the mobile network infrastructure, in combination with basic demographic information, can be used to predict crime. In our experimental results with real crime data from London we obtain an accuracy of almost 70% when predicting whether a specific area in the city will be a crime hotspot or not. Moreover, we provide a discussion of the implications of our findings for data-driven crime analysis.

This is fascinating paper and the approach, as many commentators have pointed out, is eerily reminiscent of The Minority Report. Here’s a map of predicted crime in and around London:

Predicted Crime Hotspots Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Nuria Oliver, Fabio Pianesi, Alex Pentland

Predicted Crime Hotspots 

This was derived from "anonymized and aggregated human behavioral data computed from mobile network activity in the London Metropolitan Area ...Telefonica Digital's [Smart Steps] product."

Sample visualization of the high-level information Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Nuria Oliver, Fabio Pianesi, Alex Pentland

Sample visualization of the high-level information from Smartsteps.

Smart Steps was designed so that "you can analyze footfall in any specified location and see the catchment of any specified area." When Smart Steps data is combined with crime data and massaged it provides a unique insight into where crime hotspots are but also, as the paper explains, into where crime is likely to happen and, by extension, is currently happening.

This all sounds pretty good but there is a potential downside; anonymized data often isn't that anonymous (see Measuring Risk and Utility of Anonymized Data Using Information Theory and Anonymisation: managing data protection risk code of practice) so actually tracking specific people could be possible and could lead to abuse in real world deployments.

Even so, it's probable that predictive analytics for geolocating future crime areas will become an accepted and valuable law enforcement technique. 


[Thanks to Jerry Dixon]

Footnote: Back in 2012 I discussed the case of Target using data mining to find which of their customers had just become pregnant: 

Target's marketing department apparently wanted to determine if a customer was pregnant even if she didn't want Target to know. They asked this creepy question because, according to [an article in The New York Times], "new parents are a retailer's holy grail."

It turns out that consumers' shopping habits are mostly fixed and very hard to change. You, for example, might always buy your groceries at Ralph's, your toys at Toys R Us, get your prescriptions from CVS, and so on, but when a new baby arrives these habits are far more malleable because the parents' lives are in turmoil (maybe "chaos" would be a better description).

What Target wanted to do was "educate" expectant mothers that Target is a one-stop shop where the majority of day-to-day consumer requirements are available, thus simplifying their newly complicated lives.

While births are a matter of public record (as anyone who has had a child knows because you are immediately bombarded with endless sales offers relating to your new and exhausting status), being pregnant is essentially (but only for now) a private matter.

So, Target's marketing folk reasoned, because we monitor shopping habits in detail, there might be enough data to determine if a customer was pregnant and, if so, that would enable Target to send expectant mothers advertising weeks or months in advance of the birth and potentially change the parents' buying habits.

Now, you may be wondering, how does Target monitor consumer's shopping habits? Actually it is pretty straightforward: It assigns a "Guest ID" to each customer to which is attached any and all data, including every credit card transaction, every website visit, every survey filled out, every coupon redeemed ... in fact, everything and anything is grist for Target's data mill.

Along with that data it also assembles consumers' demographic data. According to the NYT article this includes "your age, whether you are married and have kids, which part of town you live in, how long it takes you to drive to the store, your estimated salary, whether you've moved recently, what credit cards you carry in your wallet and what websites you visit."

In addition, Target buys data about "your ethnicity, job history, the magazines you read, if you've ever declared bankruptcy or got divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own."

This may sound insane, but don't think Target is even remotely unusual in its customer data acquisition and mining practices.

Target has a department named "Guest Marketing Analytics" and its ace statisticians looked at the mountains of data and concluded that there were, indeed, certain buying patterns that indicated not just that a customer was pregnant, but also when, approximately, she was due! Thus, Target started sending out promotions based on these insights and, sure enough, the promotions worked!

Then about a year after the strategy had been implemented, a man walked into a Target store and complained angrily to the manager that his 14-year old daughter had been sent offers for cribs and baby clothes. He was, it was reported, furious and asked "Are you trying to encourage her to get pregnant?"

The manager was very apologetic and some days later called the father to apologize again. Much to the manager's surprise it was the father's turn to be apologetic. He had talked with his daughter and to his surprise she was, indeed, pregnant! Target knew before the girl's family just by mining data!

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2014 IDG Communications, Inc.