Why did the internet and cell phones go out in Houston during Hurricane Beryl?

Hurricane Beryl
Hurricane Beryl

TLDR: Just about everything that could go wrong, did go wrong

Hurricane Beryl Lead Up – Wind Event Conversation

It is no secret that Hurricane Beryl caused quite a bit of disruption to the Houston market recently. The power was out in many areas, but an interesting collateral damage: The internet. We received a lot of questions from citizens around Houston: Why did the internet (and cell towers) go out?

An important note to the readers: As the hurricane approached the coast, we were lucky that the pervasive dry air delayed intensification. This could have easily turned into a much stronger storm.

Initial Damage

Hurricane Beryl beared down on Houston, releasing the equivalent amount of heat of a 10 megaton nuclear weapon every 20 minutes, for hours. This caused widespread damage to utility poles, transmission lines, and some sub-stations causing widespread power outages. There was also limited structural building damage to some buildings, however the star of the show here was the power lines.

An interesting fact that most don’t realize about Houston – a large amount of fiber internet connections are buried, but an equally large portion are run along the power lines aerially. As these poles were ripped and shredded and damaged due to downed trees, snapped poles, etc. there was pervasive damage to important fiber arteries that transmit data around the city due to these fiber cuts.

Facilities Damage and Power Outages

Network carriers have distributed networks throughout the city with “nodes” and “gateways”. Some of these reside in ultra high reliability data centers, such as TRG, some are in self maintained telecommunications grade facilities (non life critical grade), some are in regular buildings with little to no protection beyond backup batteries in a rack.

During hurricane Beryl, due to the loss of power most of the non-datacenter buildings went offline immediately. This is when the clock starts on the batteries. A lot of telecom carriers spec batteries in lieu of generators in a distributed redundant fashion where they spread their pops around, counting on the fact that not all of the buildings will lose power at the same time (even with no generators! Not appropriate for Houston) and have about 8-12 hours of batteries before they go offline. This is why some carriers started to go offline late in the evening hours after the hurricane, and the cell phone reception continued to degrade and go offline – where the batteries begin to fail with persistent loss of power to the site. We heard of one local heroic effort where they were bringing and changing batteries every 8 hours to try to keep the site online.

At the same time at some of the telecommunications grade facilities, that aren’t built with the same redundancy as enterprise grade facilities, mass failures due to generators not starting began to occur. We heard of about 4 carriers who had failures in this fashion – losing their entire self maintained carrier grade facilities. While some just went offline, one actually had runaway thermal heating that caused their core routing networks to overheat and cause mass equipment failures, further complicating the recovery.

The result of many of the telecommunications grade facilities was immediate failure, or eventual failure as they were not likely architected to meet mass extended power failure as is typical in Houston, nor did they appear to be properly maintained, as most of the failures appeared to be due to human error or lack of maintenance best practices.

Why did cell phone reception get so bad?

So lets start with the fact that about 50% of the internet providers in Houston lost their services city wide or submarket regionally at some point. This causes a degradation of services. On top of that, people are utilizing additional services, this surge of capacity coupled with loss of capacity causes localized network degradations on cell towers.

Further complicating things, most cell towers are operating on backup batteries or a small diesel generator. So a similar timer occurs on these systems where the cell towers progressively fail over time with persistent lack of power. We also received information that one of the major cell phone companies lost their entire self built carrier grade data center for multiple days – so you may have even seen differences between some carriers and others depending on their backhaul and ability to deliver services.

Once the total number of towers available go down, the backhaul decreases, and the people utilizing the services remains the same, the cell coverage degrades to an ultimately useless point, despite best efforts for spot coverage with mobile trucks etc.

Recovery Process Creating Additional Damage and Delaying Access To Fix

The primary and ultimately most important thing to get back online is, without a doubt, the power. This is not only for life safety, air conditioning for people, and many other reasons, but these carriers , cell towers, etc. also ultimately rely on power in the absence of more comprehensive backup generation.

While the power recovery efforts were being lead by linemen, the fiber providers are rightly deprioritized for accessing fiber cuts and damaged fiber strands. The fiber cuts were also propagated further by the lineman doing work and having to re-string and re-pole. The damage was observed to not only aerial, but also buried fiber, likely due to having to drill to place additional poles.

Consider the fact that >10,000 linemen descended on the City of Houston many of which have not worked here before. it’s a natural and unavoidable part of collateral damage. So ultimately we saw fiber damages continuing to occur for weeks after the hurricane, which did locally impact internet providers.

Takeaways

Distributed low end redundancy works in a lot of cities, but its not architected to be successful in Houston. If you have business critical needs, ask your carrier if they are popped in a data center for their gateways. Self managed carrier grade facilities proved to not be sufficient for multiple reasons. Cell towers may benefit from natural gas solutions at the edge. Battery backup may work in some cities but is not appropriate for Houston.

The initial event is just the beginning of issues for the internet with a hurricane, with the worst coming 12 hours after the initial loss of power. Prepare accordingly for this.

Lingering and new impacts can occur for weeks afterwards due to recovery efforts and fiber cuts.

We heard reports that hospitals with power were entirely offline for their important EHR and ambulatory systems due to no internet. We at TRG think its time to re-frame what critical load means from a statutory designation perspective, and standards should be built that apply to carriers to build to meet the needs of Houston. Carriers and multi tenant data centers play an incredibly important role in infrastructure, and to our knowledge the multi tenant data centers in Houston, for the most part, performed without failure. The carriers need to meet the same standards and serve the market appropriately.

We are thankful to have had 13 carriers at our data center, and while we observed multiple becoming affected at any given time, our clients were able to have the benefit of access to many carriers, underlying the value that data centers play in a community.

As the world continues to digitize, the internet is an increasingly important part of every aspect of our lives.