Data center redundancy looks at two key concepts. The first being capacity components, and the second distribution paths. People commonly make the mistake of thinking about just one of these, but they are far from the same.
A distribution path is a non-active component, rather than a capacity component. It’s distributing a type of load whether it be cooling, whether it be moving fuel from one point to another, or moving power to a server.
Capacity components, on the other hand, are active components that are creating something. The diesel for a generator would be considered a capacity component, for example, because without that you would not be able to create power for the generator. Likewise, the generator is a capacity component. The air conditioning unit too is a capacity component, as it is creating the cooling. Even a UPS system is a capacity component.
The term data center redundancy refers to a system that is designed in such a way that downtime doesn’t necessarily result from unexpected events. These systems are built to withstand problems without interrupting performance, with the aim of eliminating downtime and keeping companies online at all times.
The best redundancy plans do this very effectively, despite circumstances that would prove catastrophic for ill-prepared systems.
Why does data center redundancy matter?
Users are increasingly intolerant of downtime. Today’s businesses rely hugely on digital systems, for a wide range of different tasks and processes. So when downtime happens, it can significantly hamper the service that customers receive, leaving them dissatisfied and potentially moving them onto a competitor business.
Downtime isn’t something that many businesses give a lot of thought to, because when everything’s working as it should it isn’t something that appears to matter hugely. But the reality is, when it’s down, it really does matter! If a business feels pain from something going down, and needs to know how long it’ll be out for, then redundancy is vitally important.
A key thing to consider when we think about data center redundancy and why it’s so important is the team’s focus. Are the IT team focused on serving customers and improving the capabilities of the business, or are they tied down to serving the infrastructure they’re working with?
Supply chain issues are another thing to consider when determining the importance of data center redundancy. Redundancy is stock. If there’s a critical failure on a generator, you could be looking at a wait of up to a year or even longer for a replacement. So today’s businesses really need to think about whether they could manage without a generator for a year if they don’t have redundancy.
Companies have long estimated their own maximum tolerable period of disruption (MTPD), and this figure has always played a role in redundancy planning. But the time frames involved are growing shorter, and companies are now having to do more to limit their downtime in order to succeed.
Data is a key consideration in all redundancy planning. With businesses now handling more and more data, and customers increasingly aware of the data that they hand over and how this is looked after, the security and safety of data is enormously important. Redundancy planning is vital in ensuring that this data is secure, and that the risk of a serious data leak is minimal.
Downtime isn’t just a problem in terms of customer service. It’s also incredibly costly for the business as a whole. While staff are distracted by system outages, they’ll be left unable to fulfill their responsibilities. Organizations may also be fined for serious data leaks, and these events can have a longstanding effect on a company’s reputation, affecting its profitability long term.
2N vs. N+1: What’s the difference?
To gain a full understanding of data center redundancy, you’ll need to know about terms such as 2N and N+1, and how they differ. So let’s take a look at what these terms mean.
What does N mean?
The term ‘N’ in the context of data center redundancy refers to any units that must be duplicated as part of the redundancy plans. Typically, these units include cooling units and generators, both of which are an important part of data center redundancy planning. The ‘N’ in question will be equal to the power, backup, and cooling requirements of a facility if it is working at its full capacity.
What does N+1 mean?
A simple ‘N’ means the amount of capacity required to keep a facility up and running. So, N+1 is the term given to extra components, which can be used to avoid catastrophic problems when the unexpected happens. With N+1, the system will stay up and running even if something fails, or there is maintenance required on a single component within the system.
What does 2N mean?
If a system is 2N, this means that it is fully redundant. The term 2N is given to mirrored systems, which use a pair of different distribution systems that aren’t connected to each other. This setup means that the systems will not be relying on one another to work, and therefore if one goes down, the other can take the weight and help a business avoid downtime. Systems referred to as 2N will not fail, even if there’s a loss of power or an error within a component.
Understanding data center redundancy
Getting to grips with terms like N+1 and 2N is a good start, but far more in-depth knowledge is required to determine the real redundancy of a data center. Because reliability doesn’t rise in line with an increase in these numbers.
Adding 15 generators would give you N+14, but would it improve reliability by the same degree? The answer is no. Because the term N+14 doesn’t tell you everything you need to know about the reliability of the whole system.
There could be statistical bottlenecks of other components in the facility, and in the distribution paths. And as a practical point, ironically if you underload your generators by around 35%, you’re inadvertently doing something called wet stacking your generators. That alone can increase the likelihood of failure. So, not only has N+15 not contributed to redundancy, it could actually increase the risk of a failure.
That’s just one example of how focusing too much on N+(X) can lead businesses into a greater level of risk, as the system hasn’t been fully thought through.
So, when we think about data redundancy, we really need to focus on the big picture. Factors like these can’t be considered in a vacuum, because it’ll never tell the full story – and companies could well be caught out as a result.
A system is only ever as strong as its weakest link. If your facility goes through a single breaker that goes to a load, it’ll only ever be as reliable as that breaker. It’s statistically bottlenecked by that breaker.
Instead of thinking about individual capacity components in a vacuum, focus on the design objectives and the functional reliability of the system. By asking the right questions, you’ll be able to determine what redundancy really looks like.
Future-proof redundancy
It’s also worth thinking about future proofing as you investigate further. Remember, what’s N+1 today might not be N+1 tomorrow.
The problem is, as a whole, data centers are not taking into account design reliability for future considerations, so businesses need to really look into what their company needs now and what it might need in the future to make the best choice. This is a highly nuanced topic, which is hugely regional specific and load specific.
Most of the time, the best systems are built at scale with the help of experts. If you approach a new data center and hear lots of talk about N+(X), we’d advise you to continue looking. Because experts in the field will talk first and foremost about design reliability and the right design objectives for your business – these are the things that matter most when it comes to keeping businesses online.
Communicate risk effectively
If your business would not be comfortable with planned or unplanned downtime of up to 12 hours, it’s vital that you find a reputable data center that can support your requirements and keep you online.
Bear in mind that risk communication is a number one cause of miscommunication, particularly between executive offices and CEOs, and those running operations on the ground.
Remember, what’s right for your business is highly contextual. We can’t advise you on the best case for any business without getting to know your company and its requirements. But we would say that all businesses should focus on properly communicating risk and increasing reliability wherever needed.
Data center redundancy tiers
Redundancy is key to understanding how reliable a data center is and how well it performs. However, it can be delivered in a number of ways, so it’s sometimes tricky to compare the redundancy of data centers, which might operate quite differently.
If in doubt, the Uptime Institute’s data center tiers can assist. Renowned as a standard bearer of performance in digital infrastructure, the Uptime Institute certifies data centers according to its Tier Classification System. There are four distinct tiers: Tier 1, Tier 2, Tier 3 and Tier 4.
Each tier of the Uptime Institute’s classification system is very specific in the requirements that must be met in order to achieve this ranking. Data centers must have set capabilities and minimum service levels. Factors such as staff expertise, customer support and maintenance protocols are also taken into account.
Read more about data center tiers here.
Data center redundancy best practices
Data center redundancy provides protection to all components, to keep businesses online as much as possible. However, it’s always worth focusing on the protection of the most vital components first.
Power supplies should be prioritized and duplicated, particularly given the fact that the number of natural disasters and extreme weather events that we’re experiencing is now on the rise. Data centers need uninterruptible power supplies to function. Without this, the duplication of other components would be futile – no matter how many copies were made.
We also recommend that the following systems be duplicated:
- Backup generators
- Cooling systems
- Individual servers and their associated data
Third-party support can be used with redundant data centers to eliminate further risks and provide added protection for critical processes.
Geo-redundancy is another design that’s well worth considering. This design provides added protection from natural disasters and weather events, meaning that if a hurricane, flood, or huge storm did take a data center offline, the company would theoretically be able to continue to serve its customers without interruption.
This is particularly beneficial to global businesses, whose customers in one region might be completely unaware of the extreme weather battering a data center on another continent.
Data center redundancy design
Redundancy can be designed in many different ways. It’s built into the infrastructure of data centers, with critical components such as cooling systems, backup generators and UPS systems all being duplicated in the first instance.
Data centers have different levels of redundancy, which gives us a glimpse into how well they might avoid unexpected downtime. For many businesses, 100% uptime is essential; however, fully redundant options tend to be more expensive and might not really be necessary for some companies.
Redundancy is now achieved through many different configurations, giving a range of options that businesses can choose from to balance their focus on uptime without overstretching their budgets.
Progressive levels of security can be purchased according to the specific requirements of companies and their own risk tolerance.
Data center redundancy pricing
Redundancy doesn’t come cheap – but it often pales in comparison to the potential cost implications of unexpected downtime.
To protect profits and keep customers singing their praises, businesses need to carefully weigh up what downtime might mean for their processes, considering how budgets can be best spent to protect the business as a whole.
Businesses need to find the perfect balance between reliability and cost to create the redundant architecture that best suits their requirements. This isn’t always easy.
We recommend mapping your needs to an appropriate redundancy model, checking that your chosen data center offers the right level of protection and uptime guarantees for your business needs – and your budget.
Learn more about data center redundancy
At TRG Datacenters, we specialize in providing robust redundancy solutions to safeguard your operations against unexpected failures.
From understanding the different redundancy tiers to implementing the right strategies for your business, our experts are here to help you navigate the complexities of data center redundancy.
If you’re unsure about what your company might need, or you have further questions about fault tolerance and redundancy, don’t hesitate to get in touch with our team.