- Silicon Valley's 19 Coolest Places to Work
- Is Windows 8 Development Worth the Trouble?
- 8 Books Every IT Leader Should Read This Year
- 10 Hot Hadoop Startups to Watch
CIO - NEW YORK CITY--It may seem strange if you don't live in this urban concrete-and-glass jungle, but New Yorkers love their trees. Tourists may flock to Times Square, but New Yorkers know their parks are the city's heart and soul: Central Park in Manhattan, Prospect Park in Brooklyn, Flushing Meadows Corona Park in Queens, Van Cortlandt Park in the Bronx, the Greenbelt in Staten Island and the hundreds of smaller parks and urban green spaces that dot the five boroughs. And, of course, there are the trees that line the streets.
[ALSO: 7 steps to big data success]
In all, there are roughly 2.5 million trees in New York City. And while the citizens of the city love them, for City of New York Parks & Recreation, they're a big problem, but a problem big data analytics can solve.
Brian Dalessandro, data ambassador for DataKind, leads a DataDive on tree pruning data from City of New York Parks & Recreation.
It's not just a dollars-and-cents problem either; it's about lives. In an 11-month-span from 2009 to 2010, four people were killed or seriously injured by falling tree limbs in Central Park alone, including a six-month-old girl who was crushed to death in June 2010. Nearly a year earlier, a 100-pound limb fell from an oak tree in Central Park, fracturing the skull and partially severing the spine of a 37-year-old Google software engineer.
Arborists believe that pruning and otherwise maintaining trees can keep them healthier and make them more likely to withstand a storm, decreasing the likelihood of property damage, injuries and deaths. While this is the conventional wisdom, there hasn't been any research or data to back it up, says Brian Dalessandro, vice president of Data Science at media6degrees (m6d), provider of a machine learning-based ad targeting platform, and a Data Ambassador for DataKind that helps unite volunteer data scientists with nonprofit and civic organizations that have big data problems.
Leveraging Machine Learning Skills to Answer Causal Question
"Years ago, NYC Parks created a program for taking better preventative care of the city's trees," Dalessandro says. This program involves a regular schedule of pruning and grooming large trees in an effort to reduce the risk of damage from storms and high winds. For years, the department kept a record of which blocks were pruned, as well as how many times they had to dispatch a crew to remove fallen branches and upended trees.
Armed with all of this data, they approached DataKind with the following question: "Does pruning trees in one year reduce the number of hazardous tree conditions in the following year?"
Savvy advertisers and those schooled in analytics will recognize that the department was asking a causal question, and causal analysis is one of the most difficult forms of analysis one can do without a formal experiment. And let's face it, Dalessandro says, you can't A/B test the problem because you'd essentially be experimenting with people's lives.
But with the right data, you can statistically recreate an experiment, Dalessandro says, and his experience in the advertising world equipped him with the skillset to do just that, only a few years ago, his team at m6d figured out how to estimate the causal impact of ads by analyzing impression logs. But approaching the city's problem wasn't cut-and-dried. After all, while the city had been collecting lots of data, it had been collecting it for reporting purposes, not for actionable insight.