Weekly internet health check, US and worldwide

The reliability of services delivered by ISPs, cloud providers and conferencing services (a.k.a. unified communications-as-a-service (UCaaS)) is an indication of how well served businesses are via the internet.

ThousandEyes is monitoring how these providers are handling the performance challenges they face. It will provide Network World a roundup of interesting events of the week in the delivery of these services, and Network World will provide a summary here. Stop back next week for another update, and see more details here.

Update Jan 12

Global outages in all three categories increased from 96 to 157, up 39% from the week before, and in the US they jumped from 33 to 88, up 167%.

ISP outages worldwide went from 71 to 122, up 72%, and from 30 to 74 in the US, up 147%.

Cloud provider outages increased from two to seven, a 250% increase,

Globally, cloud provider network outages increased from 2 to 7, a 250% increase, and from zero to two in the US.

There was just one collaboration-app network outage and that occurred in the US. There were none the week before.

There were two notable outages during the week. On Jan. 4, Slack experienced an outage at 10 a.m. EST that lasted until after 1:40 p.m. It affected customers worldwide, with many users unable to login, send or receive messages, or to place or answer calls. Slack identified the cause as insufficient router capacity in its cloud-provider network to meet customer demand. Starting at 11:15 a.m. EST Slack implemented a fix, and many customers could use the service again by 12:15 p.m. Slack announced messaging service restoration at 1:40 p.m. EST, although it’s calendar integration features took longer to restore.

On Jan. 7, Cogent Communications experienced an outage at 4:40 p.m. that lasted just under an hour and that affected downstream providers and Cogent customers globally. It consisted of four outage occurrences over a two-hour period, the first of which centered on Cogent nodes in Amsterdam, the Netherlands, mainly affecting European countries. Five later, Cogent nodes in Washington, DC, also exhibited outage conditions. At this point the Amsterdam nodes recovered, but the Washington D.C. nodes stayed down for another 35 minutes. Thirty-five minutes after the first outage cleared, the second outage was observed, centering on nodes in Oakland, CA. It lasted four minutes and affected only customers in the US. This was repeated five minutes later, this time lasting around three minutes. Following a five minute break, a final four-minute outage was observed, this time centering on Cogent nodes in Las Vegas, NV, and Oakland, CA. The outage affected access to services including Amazon, Yandex (Russan based search engine), Oracle, and Sberbank (a state-owned Russian banking and financial services company). The outage was cleared around 6:35 p.m. EST. Click here for an Interactive view of the outage.

Update Jan. 5

Outages in all three categories decreased from 172 to 96, a 44% decrease compared to the week prior. In the US, they decreased from 80 to 33, a 59% decrease.

Globally, ISP outages decreased from 135 to 71, down 47%. In the US, they dropped from 74 to 30, a 59% decrease.

Cloud-provider network outages decreased from five to two, and in the U.S., from two to zero.

There were no collaboration app network outages the previous two weeks.

Update Dec. 21

Total outages across all three categories dropped vs. the previous week, from 252 to 193, a 23% difference. In the US they outages decreased from 115 to 89, also a 23% difference.

Globally, the number of ISP outages decreased from 180 to 145, a 19% decrease, and in the US they decreased from 97 to 75, a 23% drop.

Cloud-provider network outages worldwide decreased from 11 to four, down 64%, while in the US they fell from 2 to 1.

There were three collaboration-app network outages during the week, all in the US. The week before there were four outages, none of them in the US.

There were two notable outages. On Dec. 14 between 6:50 a.m. and 7:30 a.m. EST Google experienced a global outage. ThousandEyes tests measured elevated server wait times, indicating the application was taking longer to respond to service requests. During the service disruption, network paths connecting to Google’s edge servers did not show any traffic loss

The other notable outage hit NTT America and affected some downstream providers and NTT networks in multiple countries including the US, Germany, Brazil, the UK, and Canada. The outage was first observed around 8:30 a.m. EST and appeared to be centered on NTT infrastructure in Los Angeles, California, and Seattle, Washington. The outage lasted just over 19 minutes and was cleared around 8:50 a.m. EST. Click here for an interactive view of the outage.

Update Dec. 14

Total outages in all three categories were up 26%, from 200 to 252, over the previous week, and the were up 39% in the US, from 83 to 115.

ISP outages worldwide increased from 129 to 180, a 40% increase. In the US, ISP outages increased from 66 to 97, a 47% increase.

Worldwide cloud-provider network outages increased from eight to 11, up 38%. In the US they increased from one to two.

Globally, there were four collaboration-app network outages, up from zero. None of them were in the US.

There were two notable outages during the week. On Dec. 10, Hurricane Electric experienced a 17-minute outage that hit users in the US, Canada, Germany, Egypt, Sweden, France, and the UK. The outage was first observed around 2:11 p.m. EST centered on Hurricane Electric infrastructure in Atlanta, Georgia, and 10 minutes later just in Dallas, Texas. The last two minutes affected Hurricane Electric interfaces in both Atlanta and New York, New York. The issue was cleared around 2:38 p.m. EST. Click here for an Interactive view of the outage.

On December 8, Cogent Communications experienced an outage that, though only lasting four minutes, affected multiple downstream providers, as well as Cogent customers globally. The outage was first observed around 4:50 p.m. PST across Cogent’s global infrastructure, with Cogent nodes in the US, Germany, UK, Spain, France, Switzerland, and Ireland all reflecting the outage. The outage affected access to services including Microsoft, Amazon, SAP, Disney Streaming, and Wells Fargo. The outage was cleared around 4:55 p.m. PST. Click here for an Interactive view of the outage.

Update Dec. 7

Worldwide outages in all three categories were up compared to the week before from 159 to 200. In the US they were up from 48 to 83, 73% increase.

Globally ISP outages increased from 119 to 129, up 8%. They were up 65% in the US from 40 to 66.

Overall cloud-provider network outages increased from five to eight, but in the US they dropped from four to one.

For the first time since late September, there were zero collaboration app network outages anywhere in the world.

A notable outage occurred Dec. 2 when Level 3 Communications experienced a 14-minute outage that affected several downstream providers as well as Level 3 customers in the UK and Canada. First observed around 1:10 a.m. PST centered on Level 3 nodes in Seattle, WA. Service was restored to many of the customers and providers after five minutes and the outage was cleared at 1:25 a.m. PST.

Update Nov. 30

During the last week outages worldwide across all three categories decreased from 306 to 159, a 48% drop. In the US, they decreased 75%, from 193 to 48.

Globally, the number of ISP outages decreased by 54%, from 256 to 119. In the US they dropped 77%, from 176 to 40.

Cloud-provider network outages decreased overall from eight to five, a 38% decrease. In the US, they went up one, from three to four.

Collaboration-app network outages decreased from 4 to 1 worldwide, and in the US, the number dropped from 3 to 1.

A notable outage occurred on Nov. 25 when Kinesis, a key AWS service, suffered a day-long outage that affected other AWS services and many of its customers who rely on these services to run their businesses (including iRobot’s Roomba vacuum cleaner app). The outage was not network related, and ThousandEyes tests did not detect an elevation in packet loss during the incident. AWS later described the root cause as related to an operating system configuration in a detailed incident post-mortem.

Update Nov 23

Total outages in all three categories were up 20% globally over the week before from 256 to 306. In the US, the total rose 28%, from 121 to 193.

ISP outages globally were up 28%, from 200 to 256, and up 71%, from 103 to 176 in the US.

Globally, cloud-provider network outages decreased from 12 to 8, a 33% decrease. In the US the number remained at three for the fourth week in a row.

Collaboration app network outages worldwide increased from three to four. The US number was three, just like the week before.

There were two notable outages during the week. On Nov. 17, Cogen Communications experienced an outage that lasted over two hours, affecting several downstream providers, as well as Cogent customers globally. The outage was made of two incidents over a three-hour period. The first was observed just after 3 a.m. EST and lasted around 48 minutes. It was observed in Cogent nodes in San Francisco, California, and Oakland, California, as well as Seattle, Washington. It affected access to organizations including Microsoft and ON24. Five minutes into the outage it expanded to nodes in locations including Salt Lake City, Utah; Denver, Colorado; Chicago, Illinois; Portland, Oregon, Los Angeles, California, and Cleveland, Ohio. This in turn affected a number of networks in the US and other countries. The number of Cogent nodes displaying symptoms decreased until around the 48-minute mark when the only nodes displaying outages were restricted to those in San Jose, California.

The second outage was observed at around 4:15 a.m. EST, 20 minutes after the first one cleare. Though it lasted 90 minutes, the second outage centered in San Francisco and Oakland nodes in California. Networks affected included the California-based unified communications provider 8×8, as well as TikTok (Bytedance), and Microsoft. Click here for an Interactive view of the outage.

Another notable outage occurred Nov. 18 about 3:25 a.m. affecting PCCW Global and some of its US East customers and partners using its network to access services including Twitter, TiVo, Ellie Mae and Verizon’s AirTouch. The outage lasted around 40 minutes and occurred over two incidents across an 80-minute period. Both incidents appeared to be focused on PCCW Ashburn, Virginia, nodes. The first incident began at around 3:25 a.m. EST and lasted 13 minutes. The second incident was observed 35 minutes later and lasted around 24 minutes. The outage was cleared at around 4:45 a.m. EST. Click here for an Interactive view of the outage.

Update Nov. 16

Global outages in all three categories increased 2%, from 251 to 256, and in the US they jumped 26% from 96 to 121.

ISP outages globally were up 1% from 198 to 200, while in the US they increased 23% from 84 to 103.

Outages in public cloud provider networks worldwide decreased 20% from 15 to 12 and stayed the same in the US at three.

Collaboration app network outages decreased from four to three globally, all of them in the US.

There were two noteworthy outages during the week. Microsoft suffered an outage at 1:20 p.m. EST Nov. 10 that lasted five minutes. It affected users in countries including the US, Mexico, Ireland, Russia, and China, and it was centered in Microsoft infrastructure in Des Moines, Iowa, and Cleveland, Ohio. Click here for an Interactive view of the outage.

A Verizon outage centered at facilities in Kansas City, Kansas, and Newark, New Jersey, started at 2:25 a.m. EST, causing slow page loads for users. The New Jersey outage lasted five minutes and affected users on the U.S. East Coast, and the Kansas City outage lasted 10 minutes. Click here for an Interactive view of the outage.

Update Nov. 2

Globally, outages observed across all three categories decreased from 227 to 214, a 6% decrease compared to the week prior. In the US, total outages decreased from 121 to 85, a 30% decrease compared to.

ISP outages worldwide decreased from 184 to 151, an 18% decrease. In the US, the number of dropped from 107 to 67, a 37% decrease.

Cloud-provider network outages increased from 9 to 29, a 222% increase jump. In the US, outages increased from one to three.

Collaboration-app network outages globally dropped from five to one, a 400% decrease compared to the week prior, and in the US they dropped from three to zero.

A notable outage was suffered by Cogent Communications on Oct. 24, affecting downstream providers and Cogent customers worldwide. The outage took place in two incidents over a 60-minute period, the first lasting 24 minutes and affecting Cogent nodes across the US including those in Washington, DC, New York, NY, Atlanta, GA, Dallas, TX, Los Angeles, CA, and San Francisco, CA. The second started about 15 minutes after the first ended and lasted about eight minutes, hitting the same locations. Click here for an interactive view of the outage.

Update Oct. 26

Globally, the number of total outages in all three categories decreased 4%, while US outages overall increased 9% compared to the week prior.