In-depth: BlackBerry becomes Crashberry as RIM’s net fails for hours

Research in Motion, which runs the BlackBerry service from a network operations center in Waterloo, Ontario, may get a black eye from the net’s failure, but it sidestepped a haymaker, apparently by sheer luck.

One thing is clear about the unprecedented overnight disruption of the BlackBerry wireless e-mail service.

Research in Motion, which runs the service from a network operations center in Waterloo, Ontario, may get a black eye from the net’s failure, but it sidestepped a haymaker, apparently by sheer luck. That’s because the disruption began in the evening, when most North American users were past or near the end of their business day.

If the service had stopped at 8 a.m. Eastern Time instead of 8 p.m., “there would have been panic in the streets,” says Frank Gilman, CTO for Allen Matkins, a big Los Angeles law firm that, like many other white collar companies, seems to be powered by an endless stream of e-mails to and from BlackBerry handhelds.

Sometimes called “Crackberries” because of their addictive nature, the wireless devices are reshaping organizational work habits and expectations, liberating users from the office even as they introduce a new level of stress and anxiety according to some researchers. RIM’s success has attracted plenty of rivals, including Microsoft, as more and more enterprises seek to extend corporate e-mail to mobile devices.

The impact of the outage in BlackBerry service varied widely, according to some enterprise users. Some slept through it. Some had e-mails delayed for three or four hours, others for twice that long. Some were running into delays as late as Wednesday morning eastern time.

Even now, nearly 24 hours after RIM’s network started getting flakey, the company has given its customers almost no information about what went wrong, how it went wrong, or what RIM is doing to minimize it happening again.

Around 9 a.m. ET Wednesday, RIM sent a confidential “service interruption update” to North American customers. The e-mail says the company’s NOC began investigating about 8:15 p.m. Eastern Tuesday “began investigating monitoring alerts in regards to issues with BlackBerry service.”

“Initial investigations revealed an issue with the BlackBerry Infrastructure. Subsequent troubleshooting efforts were not immediately successful in restoring service,” according to the e-mail. “At approximately [2 a.m. ET, or six hours later] on April 18, in an effort to restore service, the components and services for the BlackBerry Infrastructure for the Americas were restarted.”

The update says that at as of about 9 a.m. ET Wednesday the “service has been operating near expected traffic levels” though some users may see delays until queued-up messages are cleared.

Perhaps optimistically, the update concluded, “Thank you for your continued support, and we apologize for any inconvenience.”

The inconvenience seems to have been much less than it would have been if the stream of e-mails had dried up during regular business hours. Nevertheless, there are signs that the impact has been significant. In one Web-based poll conducted Wednesday morning by Profitline, a telecom expense management firm, 81% of respondents (IT and telecom professionals at big companies) said the outage had cause disruptions. Almost 45% said there had been a "moderate or substantial" impact to enterprise productivity, according to ProfitLine.

But there were also unexpected consequences: the girlfriend of one BlackBerry user broke up with him because she thought he was ignoring her e-mails (see Buzzblog for more about this story). Rafael Paz, a loss control specialist for a car rental agency, writes in an e-mail that his e-mails have been running one to four hours late minimum since [Tuesday]."

The couple had a bad argument earlier in the day. "She sent me a few e-mails and when I didn't respond right away, she thought I was ignoring her and called it off,” he writes. “I didn't get the e-mail it was over until around 2 a.m. today."

Allen Maktins’ Gilman got a call around supper time in Los Angeles Tuesday night, his law firm’s executive director, asking if anything was wrong with Gilman’s own BlackBerry. That confirmed for Gilman what he had started to suspect due to an absence of incoming e-mail: “Something must be wrong.”

What was wrong was a growing instability on RIM’s network, according to Ahmed Datoo, vice president of marketing for Zenprise, a Fremont, Calif., software vendor with an application that monitors and troubleshoots Microsoft Exchange and RIM BlackBerry enterprise e-mail systems. Zenprise BlackBerry was just introduced in February.

Zenprise watched the whole disruption unfold through one of its customers, the County of Alameda, Calif.

Typically, enterprise customers install the BlackBerry Enterprise Server software behind their firewall, and link it to the corporate e-mail system, usually Microsoft Exchange. The BES (pronounced “bez”) constantly checks with Exchange, asking if new e-mails have been delivered to the users’ inboxes. If there is new e-mail, it’s handed over to the BES, which then goes through the corporate firewall to the RIM NOC, handing it over to BlackBerry servers there. Those systems connect with the appropriate cellular carrier network to deliver the e-mail to the recipient’s BlackBerry device.

According to Datoo, Zenprise started to see a problem with the Alameda County’s BlackBerry e-mail service at 2:58 p.m. Pacific Time on Tuesday: e-mails were backing up on the BES. The Zenprise software begun running automated diagnostic tests and concluded that the problem lay in RIM’s data center.

For about two hours, e-mails would back up as the RIM network went down and then start to go through as, apparently, the RIM network momentarily recovered. But at about 5 p.m. PT, according to Datoo, the RIM net apparently failed catastrophically and no e-mails were getting through.

Like most other enterprise users, Allen Matkins' Gilman lacked the insight a program like Zenprise could have provided. Responding to user queries, he ran through checks and troubleshooting on the law firm’s network before he, too, finally concluded that something had to be wrong at the RIM NOC.

Gilman finally started getting his missing e-mails delivered at about 4 a.m. PT. He knows because he keeps his BlackBerry on the bedside table.

Nearly every lawyer and all of the top management staff at Allen Matkins have the e-mail devices - about 250 in all - which were first adopted in 2000.

“One guy [in the firm] said, ‘I feel like I’m back in the '90s,’” Gilman says. “You become so used to getting your e-mail in the blink of an eye. You hit ‘send’ and you just assume the other person has got what you sent. And 99% of the time, that’s true.”

Most of the firm’s users could have used Web e-mail if necessary, Gilman says.

Roughly the same number of users, at Oregon State University in Corvallis, seems to have been completely unaffected by the outage. “I was unaware of any service problems,” says Jon Dolan, associate director of network services. “We’ve gotten no complaints from our user base.”

There was a gap in his messages from 9 p.m. to 1 a.m. PT Tuesday night, but such a gap is not unheard of. For Dolan, BlackBerry is way for him to be continually updated with alert messages from an array of different university computer systems and applications.

Part of the fallout that RIM will have to deal with is the lack of information and help it provided its customers. The help desk at CareGroup Healthcare System, a Boston, Mass., hospital group, started getting user complaints and researched the problem by reviewing local and national news Web sites, says CIO John Halamka. Concluding that the RIM service was down, the help desk sent out an e-mail alert - via the desktop e-mail system - to nearly 500 BlackBerry users, telling them about the problem and suggesting alternate methods for sending and receiving e-mails.

Service was down for several hours for CareGroup, and some users reported losing browsing capabilities over the AT&T/Cingular network, he says. Service started to return by about 6 a.m. ET Wednesday and was nearly completely returned by 9:30.

As of late Wednesday afternoon, according to Halamka, “We received no communication from RIM, nor was anything on their Web site or [on] BlackBerry.com to give customers any information on their status.”

Another issue is whether the unusual outage will trigger penalties under service level agreements that enterprise users have with RIM. Gilman says that problems have been so rare and so mild that he can’t recall the last time he glanced at the contract to know what the SLA terms are. But he thinks it’s unlikely that RIM will be formally penalized.

Don’t expect a mass exodus from RIM, either, according to Zenprise’s Datoo. “I think [such events] are the cost of doing business with mobile devices,” he says. “The reality is they are outside the control of the enterprise. Ultimately, these devices are not on your network. The question then becomes what can enterprises do to manage this kind of complexity?”

Learn more about this topic

More on the BlackBerry outage:

BlackBerry suffers widespread outage

BlackBerry service restored, slow response irks users

Could RIM have responded better to outage?

Buzzblog: BlackBerry owes this guy a girlfriend

 

Other BlackBerry stories:

MIT Sloan study asks: Does BlackBerry equal 'CrackBerry' or career essential?

 
Join the discussion
Be the first to comment on this article. Our Commenting Policies