Global internet health check and network outage report

ThousandEyes, which tracks internet and cloud traffic, provides Network World with weekly updates on the performance of ISPs, cloud service providers, and UCaaS providers.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Page 20
Page 20 of 22

There was just one collaboration app provider outage worldwide -- not in the U.S. -- down from two.

PCCW Global suffered two outages starting about 12:40 a.m. EDT Sept. 3, one lasting 20 minutes, and the other lasted six. The first centered on PCCW nodes located in Atlanta, Ga., and affecting services using the Charlotte Colocation and Affiliated Computer Services networks. The second started about half an hour after the first cleared and centered on PCCW nodes in Ashburn, Va., and affected access to Oracle Cloud services. All outages were cleared by 1:45 a.m. EDT. The cause was likely the result of a traffic-engineering exercise.

About 6 p.m. PDT, Comcast suffered a four-minute outage that affected users in the western U.S. centered on Comcast core devices in Sunnyvale, Calif. and mainly affecting services across Comcast Xfinity networks (Comcast Cable Communications). The outage would likely have caused internet connectivity slowdowns and disruption for users.

Update Aug. 31

Globally the number of outages observed across all three categories increased by 29% from the week prior, rising from 296 to 381. This was the largest number of outages recorded in a single week this year. In the U.S. outages increased 70% compared to the week prior from 106 to 180.

The vast majority of outages were due to ISP problems. Worldwide the number jumped from 214 to 319, with the count in the U.S. growing from 80 to 150.

Public cloud outages declined worldwide fom 27 to 12 and from four to two in the U.S.

Collaboration apps networks stayed stead at two worldwide, with both of them occurring in the U.S. where the count rose from zero to two.

CenturyLink suffered a major outage just after 6 a.m. EDT Aug. 30 that hit a broadrange of providers and businesses including Twitter, Microsoft (Xbox Live), Discord, Reddit, Cloudflare, OpenDNS, and Hulu. Shortly after the outage began, providers started rerouting traffic from CenturyLink to alternate providers in an effort to alleviate the impact, however, given the size and distribution of CenturyLink’s network, many services were still unreachable, ThousandEyes said. At 8:13 a.m. EDT, CenturyLink announced it was investigating issues affecting some services within their Mississauga, Ontario, Canada data center. Having identified the cause as an incorrect flowspec announcement from the Mississauga data center, CenturyLink requested that its Tier 1 Internet provider partners de-peer and ignore any traffic coming from its network. (BGP flow specification (flowspec) is a feature that allows you to rapidly deploy and propagate filter policies among a large number of BGP peer routers.) In order to resolve the issue, CenturyLink reset all the equipment and start with clean BGP routing tables, a process that took almost five hours to complete. Just before 3:00 p.m. EDT, CenturyLink announced that the issue had been resolved and all services had been restored.

Update Aug. 24

Globally the total number of outages observed across all three categories during the week Aug. 17-23 increased by 21% compared to the week prior, rising from 245 to 296. This increase in the U.S. rose from 90 to 106 an increase of 18% from the week prior.

ISP outages worldwide rose from 166 to 214 and from 72 to 80 in the U.S.

Public cloud network outages dropped worldwide from 28 to 27, and stayed the same in the U.S. at four.

Collaboration app network outages rose from zero to two globally, but remained at zero in the U.S.

ThousandEyes flagged three notable outages during the week.

Just after 8 a.m. EDT on Aug. 18, Spotify suffered an outage that prevented users from streaming songs from the service. The outage lasted just over an hour and would play songs for a few seconds, then pause and return an error. The outage is believed to be assosicated with an expired TLS certificate. Click here for an explanation on the impact of certificate expiration.

About 11:30 p.m. EDT on Aug. 17, Equinix suffered a power outage to a colocation center in Docklands, London. About 2 a.m. the failure of an output static switch from a UPS system triggered a fire alarm, resulting in loss of power for multiple customers. At 3:50 a.m. services started to be restored and were fully restored by 4:50 p.m. EDT. Affected customers included BT, Sky, Virgin Media, Giganet, Epsilon, SiPalto, EX Networks, Fast2Host, ICUK.net, and Evoke Telecom.

About 10:50 p.m. PDT on Aug. 19 Cogent Networks suffered a 36-minute outage affecting U.S. users’ access to Microsoft networks and associated services, as well as CDN content for services such as TikTok and ESPN. The outage affected nodes across the U.S. and apparently resulted from a configuration adjustment. A second outage two hours later at 11:26 p.m. PDT lasted 24 minutes and likely was connected to the first outage’s configuration adjustment. It affected users in the U.S., Asia-Pacific and Europe, Mid-East and Africa. Click here for an interactive view of the outages.

Update Aug. 17

Global outages across all three categories fell between the weeks of Aug. 3-9 and Aug. 10-16 from 294 to 245 (-17%) and in the U.S. from 123 to 90 (-27%).

ISP outages dropped worldwide from 227 to 166 and from 109 to 72 in the U.S.

Public cloud outages worldwide fell from 30 to 28 and from five to four in the U.S.

Collaboration app network outages worldwide remained at 0 for the second week in a row.

Cogent Networks suffered a notable outage at about 10:30 p.m. EDT on Aug. 13 that lasted about 40 minutes and affected its Atlanta, Ga., network. It affected access to Microsoft networks and associated services, such as Sharepoint, Office, Azure services and hosting, and appeared to be located in the Cogent data center in Atlanta. Based on the affected interfaces and nodes it appears it was a result of configuration adjustments rather than a control-plane issue.

Separately, BT incurrd an outage on its European backbone about 7:30 p.m. EDT affecting customers and partners in the U.K., U.S., Sweden, and Germany. The outage came in three four-minute intervals spanning 25 minutes, indicatingan automated restoration process and likely was for maintenance. The outage cleared at 7:55 p.m. EDT.

Update Aug. 10

Globally, there were no collaboration app network provider outages observed this week. In the U.S., this is the second consecutive week of zero outages.

Overall the number of outages in all three categories increased from 248 to 294, the highest tally since late April. In the US the total was up from 99 to 123.

ISP outages globally increase from 181 to 227. In the U.S. the increase was from 88 to 109.

Cloud-provider outages worldwide rose from 18 to 30, and in the U.S. increased from three to five.

Collaboration app network outages dropped from 1 to 0. U.S. outages remained at 0.

About 8:25 p.m. PDT on Aug. 4 Cogent Networks experienced a 15-minuite network disruption affecting parts of its San Francisco network and its infrastructure in the U.K., Germany and the Netherlands. It affected nearly 70 network interfaces. The scope and timing of the disruption indicates the provider was making service adjustments/maintenance. An interactive visualization of the outage is here.

About 3:25 a.m. CDT on Aug. 5, GTT had a 10-minute network disruption affecting parts of their infrastructure in Dallas, Chicago, Los Angeles, and London. The timing and scope of the disruption are consistent with service-adjustment activity. Interactive visualization of the outage is here.

Update Aug. 3

During the week of July 27-August 2, the number of outages globally in all three categories decreased by 6% from the week prior, from 263 to 248. In the U.S., outages rose from 90 to 99, a 10% increase from the week prior.

The number of ISP outages globally decreased by 1%, dropping from 183 to 181. In the U.S., ISP outages rose from 73 to 88, a 21% increase compared to the week prior.

Worldwide cloud provider outages decreased by 38% when compared to the week prior. In the U.S., there were three public cloud network outages for the third consecutive week.

Globally, collaboration app network provider outages decreased from 3 to 1, a drop of 66% when compared to the week prior. In the U.S., no collaboration app network outages were recorded this week.

There were two noteworthy outages during the period:

Verizon Business suffered an outage within their network that impacted users accessing services such as Zoom, Bloomberg Professional and Flagstar Bank. The outage centered on former UUNET nodes located in San Jose Calif., and Seattle. The outage occurred just before 11:00AM PDT on July 27 and lasted a total of 27 minutes, over a 55-minute period. The outage cleared around 11:55AM PDT.

Reddit users began to experience some errors when accessing Reddit's site around 10:30AM EDT on July 29. During the incident, the Reddit site was reachable, but many of the page components produced errors either failing to load or simply not responding to requests, all of which is indicative of an application issue as opposed to a network disruption. A fix was implemented by Reddit at 1:32PM EDT, and Reddit announced that the issue had been resolved at 3:24PM EDT.

Update July 27

During the week July 20-26, the number of outages globally in all three categories increased by 14% from the week prior, from 231 to 263. In the U.S., outages rose from 70 to 90, a 29% increase from the week prior.

The number of ISP outages globally increased by 5%, from 175 to 183. In the U.S., ISP outages rose from 60 to 73, a 22% increase and a return to late June levels.

Cloud-provider outages worldwide were almost double, increasing 93%, from 15 to 29. In the U.S., there were three public cloud network outages for the second consecutive week.

Globally, collaboration-app network provider outages increased from 1 to 3, a rise of 200%, with all outages attributed to a single provider in the U.S. These were the first collaboration outages seen domestically since mid-June.

The most noteworthy outage of the week occurred just after 3:15 a.m. EDT on July 23 when services on Garmin.com and Garmin Connect became interrupted. The outage – which at the time of this writing is ongoing – also affects Garmin call centers, which were unable to receive calls and emails or participate in online chats. The network connectivity to Garmin services remains active, but syncing data and accessing functions on Garmin Connect remain down. Since Thursday, users attempting to access these functions have been met with a “Server Maintenance” message. In a press release on the 27th, Garmin confirmed it suffered a cyber attack that encrypted some of their systems, resulting in many of their online services being interrupted.

Update July 20

During the week of July 13-19 global outages of all three kinds dropped 19% from the week before, from 285 to 231. The drop in U.S. outages was even greater – 28% - from 97 to 70.

ISP outages dropped globally from 215 to 175 or 19%. In the U.S. they dropped 34%, from 91 to 60.

Cloud provider outages dropped 58%, from 36 to 15, and most of those occurred in South America. U.S. outages rose from two to three, or 50%.

Globally, collaboration-app network outages decreased from four to one,  a drop of 75%,  with the outage attributed to a single provider in the U.K. There were no outages in the U.S. for the fifth week in a row.

GitHub suffered an outage just after 2:30 a.m. EDT July 13 that lasted until 4:31 a.m. EDT. Users were affected worldwide. GitHub hasn’t provided details about what caused the outage, but ThousandEyes said there are indications that the source was within GitHub services.

WhatsApp suffered an outage for about an hour on July 14 starting about 6:45 p.m. EDT that prevented users globally from sending and receiving messages on the service. Once the outage was over, users could connect to the service, but once loaded they were unable to execute any functions. WhatsApp confirmed to ThousandEyes that the cause was an internal update to servers.

Update July 6

For the week June 29-July 5, the number of global outages across all three categories increased from 199 to 208, a 5% increase. In the U.S., however, outages dropped from 83 to 63, a 24% decrease from the week prior.

Globally, the number of ISP outages decreased 5%, from 160 to 152. The number of U.S. ISP outages decreased as well, from 77 to 55 outages. Both drops represent the lowest numbers of ISP outages since February.

Worldwide, cloud-provider outages decreased by 11%, from 28 to 25. The lone cloud-provider outage recorded in the U.S. this week was a decrease of 80% from five outages the week before.

Globally, collaboration-app network provider outages increased from 0 to 2, the first outages recorded since early June. The U.S. had zero collaboration app outages this week, recording just two outages in all of June.

There were two noteworthy outages during the period:

On June 29 at 8:15 a.m. PDT a power failure affected the Google Compute Engine in service zones us-east1-c and us-east1-d. Customers experiencing the service interruption would not have been able to reach existing Virtual Machines or create new ones. Other zones in the region were not impacted, so a redundant architecture, where workloads are hosted in multiple zones within a region, would have mitigated user impact. Google announced that all services had been restored and the issues resolved at 1:06PM PDT.

Related:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Page 20
Page 20 of 22
The 10 most powerful companies in enterprise networking 2022