How to collect and analyze data from 100,000 weather stations

Want to know what it’s like to analyze massive amounts of data under pressure? Talk to Bryson Koehler, CIO of The Weather Company (which owns the Weather Channel), who must interpret data sets from around the world to predict something as volatile as the weather.

Predicting the weather means collecting and analyzing massive amounts of data.
Credit: Getty Images

If you want to understand what it takes to collect, track and analyze reams of data, just check the weather. There are constant fluctuations, scores of data points and intense interest from all over the planet. Analyze the data correctly and someone in the state of Washington knows whether or not to wear a raincoat. Do it poorly and there might be a massive traffic pileup from people driving too fast on slick roads. 

Bryson Koehler understands this dynamic. As CIO of The Weather Company, he’s charged with increasing the accuracy of weather forecasting for the various entities the company owns, which include the Weather Channel and the Weather Underground mobile app. 

The app in particular uses a massive personal sensor network to increase accuracy. Even a smartphone can be a basic weather station: The Weather Company uses algorithms that can determine the outside temperature for that user based on what the phone is reporting. 

There are 100,000 sensors sending data worldwide (and 40,000 in the United States alone). Understandably, processing the data is no easy task. 

[Related: Coping with weather may require a change of computer] 

“Some of the data is interesting – such as lightning data or pollen data – and it doesn’t always help us create a forecast, but we can tell people who have allergies what to expect,” Koehler says. “Other types of data we get in real time, such as aircraft telemetry data – installations on commercial aircrafts that we bring down in real time to see atmospheric conditions.” 

Koehler says the flight data is incredibly helpful. It can be used to alert airlines about possible changes in flight plans, or let them know the wear and tear on a plane is not as significant as it might have seemed during a flight. This data can help minimize delays, since the airlines are required to do extra safety checks related to severe weather. The Weather Company can tell if the real-time weather data did not reach as high a threshold as the pilot might have reported. 

The analysis is intense. Stations provide data for humidity, barometric pressure, dew point, UV load, rainfall, wind and many other factors. There are billions of reports sent in each month, according to Koehler. The station data is repurposed into a format people can use and understand. 

“People can pull up different layers of maps, and they can pull up forecasts from all over the globe,” he says. “In contrast, the National Weather Service in the U.S. has about 3,500 recording stations that they own and operate on behalf of U.S. taxpayers.” 

More instruments mean more data

It’s an interesting dilemma to have such an abundance of data to process. Koehler says that the NWS is one of the world’s most “most instrumented” government agencies. Yet, the Weather Company has to deal with many thousands of personal weather stations worldwide. Some of the stations are not easily accessible – they could be in a remote region of Iceland. Some of the weather sensors are as small as a Coke can and some involve an antenna that is three-feet tall. 

[Related: How to profit from the ultimate big data source: the weather] 

The Weather Company acts as a “clearinghouse” for this data collection, says Koehler. The company monitors the stations and knows exactly how each one works – that the station is a RainWise product that collects data every second versus a Netatmo station that might not collect as often, for example. 

Part of the challenge is in interpreting the data correctly. The Weather Company might look for trends from data collected from multiple phones and stations in the same area. The company has figured out how to compare data sets with varying levels of accuracy and quality and still derive some value, especially in terms of weather trends. All of the collected data is valuable, Koehler says. 

Interestingly, the data sets are typically quite small. In total, Koehler says his company collects a “couple hundred” terabytes from personal weather stations. 

A whole lotta ping

“It’s a very chatty environment,” he says. “There is a high frequency of ping. So we have to use a very scalable infrastructure, since there are a few hundred devices added every day. And the frequency of the data input continues to rise.” 

The Weather Company had previously been using Amazon Web Services for the data collection and processing. At the time of this writing, the company had switched to IBM Cloud, primarily due to costs and presence in the market. 

“IBM Cloud has been growing rapidly, particularly as a resource for large enterprises,” says Charles King, a noted IT expert with Pund-IT. “IBM is dedicating significant budget to rolling out a global network of cloud data centers. By partnering with IBM, the Weather Channel will benefit from IBM's global cloud resources [to support its own global network] and should also be able to monetize its assets as part of the [Internet of Things] services IBM is envisioning. 

“If, as many scientists and insurance companies believe, we're heading into a future where extreme weather events become increasingly common, the partnership should be a good deal for both companies and their respective customers,” King adds. 

"The increasing use of social and sensor networks are producing significant amounts of high-throughput data available for mining in areas like customer behavior, biological systems and environmental conditions," says Matt Wood, general manager for Data Science at Amazon Web Services. "The critical barrier to big data, which has traditionally been the infrastructure required to collect, compute and collaborate, is now being transformed through the use of cloud computing with AWS.” 

In the end, what makes the collection from 100,000 sensors so noteworthy is that it is a major test of cloud infrastructure. King says the data is rich and layered, but fairly consistent and predictable in terms of how often the stations send in reports. Whether the reports are from an airplane, a station in Iceland or a smartphone, the algorithms are ready to help provide a more accurate weather forecast with every single ping.

This story, "How to collect and analyze data from 100,000 weather stations" was originally published by CIO.

To comment on this article and other Network World content, visit our Facebook page or our Twitter stream.
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.