COVID-19: Weekly health check of ISPs, cloud providers and conferencing services

ThousandEyes, which tracks internet and cloud traffic, is providing Network World with weekly updates on the performance of three categories of service provider: ISP, cloud provider, UCaaS

thousandeyes map
ThousandEyes

As COVID-19 continues to spread, forcing employees to work from home, the services of ISPs, cloud providers and conferencing services a.k.a. unified communications as a service (UCaaS) providers are experiencing increased traffic.

ThousandEyes is monitoring how these increases affect outages and the performance challenges these providers undergo. It will provide Network World a roundup of interesting events of the week in the delivery of these services, and Network World will provide a summary here. Stop back next week for another update, and see more details here.

Update Sept. 21

The number of outages reported globally in all three categories was 230 for the week Sept. 14-20, up 50% from 153 the week before. In the U.S., the count was 73 for the latest week, up two from the week before.

ISP outages rose 62%, from 108 to 175 worldwide, and from 57 to 63 in the U.S., and increse of 11%.

Public-cloud provider outages were down a third, from 12 to eight, with the count in the U.S. dropping from five to just one.

Collaboration-app network providers suffered a single outage this week, with that one occurring in the U.S. The week before, there were none.

Instagram and Amazon suffered notable outages during the week.

About 11:10 a.m. PDT on Sept. 17 Instagram experienced a service disruption that prevented many users worldwide from using the application. With no network or reachability issues with its front-end servers, and users receiving HTTP 502 error notifications, the cause  appeared to be anapplication back-end issue. Service began to return about 11:15 a.m. PDT, wth full service restored by 11:45 a.m. PDT. Click here for an interactive view of the outage.

About 2:45 p.m. EDT Sept. 14 Amazon suffered a 29-minute outage centered on nodes in Columbus, Ohio, and affecting Amazon cloud-compute instances at its Hilliard, Ohio, data center. The outage affected 99 interfaces and was contained to the one location. The impact was that some users experiencing non-responsive or slow EC2 instances. The outage was cleared just past 3 p.m. EDT.

Update Sept. 14

Globally the number of outages observed between Sept. 7 and 13 in all three categories decreased by 40% from the week before, from 256 to 153, the lowest figure observed since early February. In the U.S., the number of outages dropped from 134 to 71, a 47% decrease.

The number of ISP outages worldwide dropped 50% from 216 to 108. In the decrease was 54% from 123 to 57, the lowest weekly number since early April.

Cloud-provider outages globaly decreased by 25% from 16 to 12. In the U.S. they remained at 5 for the second consecutive week.

Globally and in the U.S. for the first week since early August, no collaboration app network outages were recorded.

Cogent Communications suffered three outages about 11:45 a.m. EDT on Sept. 11, lasting 36 minutes. They three lasted 13 minutes, 4 minutes and 19 minutes, spread across just over an hour. All three centered on Cogent node in Newark, N.J., and affected customers across the U.S. and also in the U.K., Netherlands, Canada, Mexico and India. The customers were using the network to access services such as Visa online services, Microsoft office, and Shopify. The outages occurring during business hours and their focus indicate that some form of control-plane condition was the cause. The problem was cleared about 12:50 p.m. EDT.

Update Sept. 7

Globally the number of outages observed in all three categories decreased by 33% from the week prior, from 381 to 256. In the U.S., the number of outages dropped by 46, decreasing from 180 to 134, a 26% decrease from the week prior.

Worldwide, the number of ISP outages decreased by 103, dropping from 319 to 216, a 32% decrease and accounting for 84% of all outages observed this week. In the U.S., the number of ISP outages decreased by 27, dropping from 150 to 123, an 18% decrease.

Cloud provider outages globally increased by a third from 12 to 16. In the U.S., outages more than doubled, rising from two to five.

There was just one collaboration app provider outage worldwide -- not in the U.S. -- down from two.

PCCW Global suffered two outages starting about 12:40 a.m. EDT Sept. 3, one lasting 20 minutes, and the other lasted six. The first centered on PCCW nodes located in Atlanta, Ga., and affecting services using the Charlotte Colocation and Affiliated Computer Services networks. The second started about half an hour after the first cleared and centered on PCCW nodes in Ashburn, Va., and affected access to Oracle Cloud services. All outages were cleared by 1:45 a.m. EDT. The cause was likely the result of a traffic-engineering exercise.

About 6 p.m. PDT, Comcast suffered a four-minute outage that affected users in the western U.S. centered on Comcast core devices in Sunnyvale, Calif. and mainly affecting services across Comcast Xfinity networks (Comcast Cable Communications). The outage would likely have caused internet connectivity slowdowns and disruption for users.

Update Aug. 31

Globally the number of outages observed across all three categories increased by 29% from the week prior, rising from 296 to 381. This was the largest number of outages recorded in a single week this year. In the U.S. outages increased 70% compared to the week prior from 106 to 180.

The vast majority of outages were due to ISP problems. Worldwide the number jumped from 214 to 319, with the count in the U.S. growing from 80 to 150.

Public cloud outages declined worldwide fom 27 to 12 and from four to two in the U.S.

Collaboration apps networks stayed stead at two worldwide, with both of them occurring in the U.S. where the count rose from zero to two.

CenturyLink suffered a major outage just after 6 a.m. EDT Aug. 30 that hit a broadrange of providers and businesses including Twitter, Microsoft (Xbox Live), Discord, Reddit, Cloudflare, OpenDNS, and Hulu. Shortly after the outage began, providers started rerouting traffic from CenturyLink to alternate providers in an effort to alleviate the impact, however, given the size and distribution of CenturyLink’s network, many services were still unreachable, ThousandEyes said. At 8:13 a.m. EDT, CenturyLink announced it was investigating issues affecting some services within their Mississauga, Ontario, Canada data center. Having identified the cause as an incorrect flowspec announcement from the Mississauga data center, CenturyLink requested that its Tier 1 Internet provider partners de-peer and ignore any traffic coming from its network. (BGP flow specification (flowspec) is a feature that allows you to rapidly deploy and propagate filter policies among a large number of BGP peer routers.) In order to resolve the issue, CenturyLink reset all the equipment and start with clean BGP routing tables, a process that took almost five hours to complete. Just before 3:00 p.m. EDT, CenturyLink announced that the issue had been resolved and all services had been restored.

Update Aug. 24

Globally the total number of outages observed across all three categories during the week Aug. 17-23 increased by 21% compared to the week prior, rising from 245 to 296. This increase in the U.S. rose from 90 to 106 an increase of 18% from the week prior.

ISP outages worldwide rose from 166 to 214 and from 72 to 80 in the U.S.

Public cloud network outages dropped worldwide from 28 to 27, and stayed the same in the U.S. at four.

Collaboration app network outages rose from zero to two globally, but remained at zero in the U.S.

ThousandEyes flagged three notable outages during the week.

Just after 8 a.m. EDT on Aug. 18, Spotify suffered an outage that prevented users from streaming songs from the service. The outage lasted just over an hour and would play songs for a few seconds, then pause and return an error. The outage is believed to be assosicated with an expired TLS certificate. Click here for an explanation on the impact of certificate expiration.

About 11:30 p.m. EDT on Aug. 17, Equinix suffered a power outage to a colocation center in Docklands, London. About 2 a.m. the failure of an output static switch from a UPS system triggered a fire alarm, resulting in loss of power for multiple customers. At 3:50 a.m. services started to be restored and were fully restored by 4:50 p.m. EDT. Affected customers included BT, Sky, Virgin Media, Giganet, Epsilon, SiPalto, EX Networks, Fast2Host, ICUK.net, and Evoke Telecom.

About 10:50 p.m. PDT on Aug. 19 Cogent Networks suffered a 36-minute outage affecting U.S. users’ access to Microsoft networks and associated services, as well as CDN content for services such as TikTok and ESPN. The outage affected nodes across the U.S. and apparently resulted from a configuration adjustment. A second outage two hours later at 11:26 p.m. PDT lasted 24 minutes and likely was connected to the first outage’s configuration adjustment. It affected users in the U.S., Asia-Pacific and Europe, Mid-East and Africa. Click here for an interactive view of the outages.

Update Aug. 17

Global outages across all three categories fell between the weeks of Aug. 3-9 and Aug. 10-16 from 294 to 245 (-17%) and in the U.S. from 123 to 90 (-27%).

ISP outages dropped worldwide from 227 to 166 and from 109 to 72 in the U.S.

Public cloud outages worldwide fell from 30 to 28 and from five to four in the U.S.

Collaboration app network outages worldwide remained at 0 for the second week in a row.

Cogent Networks suffered a notable outage at about 10:30 p.m. EDT on Aug. 13 that lasted about 40 minutes and affected its Atlanta, Ga., network. It affected access to Microsoft networks and associated services, such as Sharepoint, Office, Azure services and hosting, and appeared to be located in the Cogent data center in Atlanta. Based on the affected interfaces and nodes it appears it was a result of configuration adjustments rather than a control-plane issue.

Separately, BT incurrd an outage on its European backbone about 7:30 p.m. EDT affecting customers and partners in the U.K., U.S., Sweden, and Germany. The outage came in three four-minute intervals spanning 25 minutes, indicatingan automated restoration process and likely was for maintenance. The outage cleared at 7:55 p.m. EDT.

Update Aug. 10

Globally, there were no collaboration app network provider outages observed this week. In the U.S., this is the second consecutive week of zero outages.

Overall the number of outages in all three categories increased from 248 to 294, the highest tally since late April. In the US the total was up from 99 to 123.

ISP outages globally increase from 181 to 227. In the U.S. the increase was from 88 to 109.

Cloud-provider outages worldwide rose from 18 to 30, and in the U.S. increased from three to five.

Collaboration app network outages dropped from 1 to 0. U.S. outages remained at 0.

About 8:25 p.m. PDT on Aug. 4 Cogent Networks experienced a 15-minuite network disruption affecting parts of its San Francisco network and its infrastructure in the U.K., Germany and the Netherlands. It affected nearly 70 network interfaces. The scope and timing of the disruption indicates the provider was making service adjustments/maintenance. An interactive visualization of the outage is here.

About 3:25 a.m. CDT on Aug. 5, GTT had a 10-minute network disruption affecting parts of their infrastructure in Dallas, Chicago, Los Angeles, and London. The timing and scope of the disruption are consistent with service-adjustment activity. Interactive visualization of the outage is here.

Update Aug. 3

During the week of July 27-August 2, the number of outages globally in all three categories decreased by 6% from the week prior, from 263 to 248. In the U.S., outages rose from 90 to 99, a 10% increase from the week prior.

The number of ISP outages globally decreased by 1%, dropping from 183 to 181. In the U.S., ISP outages rose from 73 to 88, a 21% increase compared to the week prior.

Worldwide cloud provider outages decreased by 38% when compared to the week prior. In the U.S., there were three public cloud network outages for the third consecutive week.

Globally, collaboration app network provider outages decreased from 3 to 1, a drop of 66% when compared to the week prior. In the U.S., no collaboration app network outages were recorded this week.

There were two noteworthy outages during the period:

Verizon Business suffered an outage within their network that impacted users accessing services such as Zoom, Bloomberg Professional and Flagstar Bank. The outage centered on former UUNET nodes located in San Jose Calif., and Seattle. The outage occurred just before 11:00AM PDT on July 27 and lasted a total of 27 minutes, over a 55-minute period. The outage cleared around 11:55AM PDT.

Reddit users began to experience some errors when accessing Reddit's site around 10:30AM EDT on July 29. During the incident, the Reddit site was reachable, but many of the page components produced errors either failing to load or simply not responding to requests, all of which is indicative of an application issue as opposed to a network disruption. A fix was implemented by Reddit at 1:32PM EDT, and Reddit announced that the issue had been resolved at 3:24PM EDT.

Update July 27

During the week July 20-26, the number of outages globally in all three categories increased by 14% from the week prior, from 231 to 263. In the U.S., outages rose from 70 to 90, a 29% increase from the week prior.

Related:
1 2 3 Page 1
Page 1 of 3
IT Salary Survey: The results are in