Human Error & Alarm Fatigue

Houston Data Center
Houston Data Center

Recent research published by Uptime shows that both the frequency and severity of data center outages is now on the decline. 

In 2023, just one in 10 outages was categorized as serious or severe, with operators reporting that 41% of outages in the past three years were negligible. This is an improvement of four percentage points from 2022 and 10 percentage points from 2021.

Human error remains a real issue in data centers, however, and now comes second only to power as the primary driving force in outages. Over the past 25 years, Uptime estimates that direct or indirect instances of human error account for between two-thirds and four-fifths of all downtime incidents.

When we look into these cases in more detail, we can see exactly where the majority of these errors occur. In 48% of cases, errors happen because of data center staff failing to follow procedures or processes. Incorrect staff processes in place caused 45% of outages, while installation issues were at fault in 23% of cases.

Reducing Alarm Fatigue & Protecting Our Employees

Alarm Fatigue is a significant issue in our industry, and it’s one of the issues that needs to be addressed in order to really tackle the human error problem. 

When everything is an emergency, people become numb and calloused to those emergencies. This lack of empathy on one side can cause MAJOR customer service interaction issues, but it can also cause unnecessary stress and turn-over within an organization.

Staff turn-over and training is one of the largest and riskiest issues in a mission critical organization. Having a deep understanding and filter of qualifying emergencies and properly applying pressure to your team members is incredibly important.

By reducing our emergency ticket volume and by building a truly fault-tolerant infrastructure (read more about that here for understanding the cornerstone of digital transformation and any transformation efforts), we have almost eliminated the “interruption emergency culture” that is pervasive and problematic in NOC’s today.

Our job satisfaction and hybrid NOC member models create higher-paying NOC Jobs (our average is currently 25% above the industry average for our city). Building long term members, at the most basic level of the organization, is a crucial building block for educating and creating a successful organization.

These NOC members are often non-traditional hybrid candidates instead of the old “Help Desk Guy” model, contributing to the TRG Difference in a large plurality of very diverse ways. Our NOC team has become a defining feature of our organization in many ways, where the rubber hits the road for TRG.

Key Attribute: Free Support

We offer a non-emergency, non-SLA’d “Best Effort” free remote hands-on tier that is entirely free. We additionally provide a paged SLA’d emergency ticket queue that is billed in 15-minute increments. Over the last year, we have recorded ZERO emergency tickets, and we have recorded three paged tickets since the inception of the facility.

Getting away from the mental static and alarm fatigue allows us to look beyond the NOCroom desk and continue to think about improving the customer experience. Making remote hands-on free has empowered us to think about what else we can do to help our customers, since its free anyways!