Risk Assessment, BIA, SLAs, RTOs, and RPOs: What’s the Link? MTD and MTDL
Risk assessment, business impact analysis (BIA), and service level agreement (SLAs) are indispensable to the development and implementation of business continuity and disaster recovery (BCDR) plans. As such, organizational leaders must thoroughly understand these terms: their similarities and differences, and how to leverage them to safeguard their business operations from threats and disruptive events.
Differentiating Between Risk Assessment (RA) and Business Impact Analysis (BIA)
Although risk assessment and BIA are sometimes used interchangeably and are quite similar in many respects, there are several differences between the two. To get a clear sense of what each brings to the business continuity table, let’s look at the differences between risk assessment and business impact analysis.
What Is Risk Assessment?
Risk assessment identifies all the threats and vulnerabilities that make up risks that could negatively impact an organization’s operations, but also reputation, employees, and more. It is primarily concerned with detecting the sources of disruption, possible areas of impact, and the likelihood that a potentially disruptive event will occur.
Risk assessment correlates the potential sources and likelihood of risks, the potential impact of disruptive events, and failure nodes that are most at risk. It also documents existing strategies and measures already in place to mitigate the impact of said risks.
A frequent outcome of a risk assessment is the generation of risk map to help visualize and prioritize the risks associated with the business, such as the two-dimensional matrix shown below.
What Is a Business Impact Analysis?
A business impact analysis is an assessment of the financial, operational, legal and other repercussions related to the major impairment or loss of a business function or process on an organization. A BIA report documents the worst-case scenarios and provides an overview of the rate, type, magnitude, and consequences of such an event. This analysis helps stakeholders and business continuity planning teams to arrive at recovery timeframes and the steps needed to fortify operations and internal resources from the projected impacts.
What’s the Difference Between Business Impact Analysis and Risk Assessment?
You can undertake a BIA without risk assessment, but every risk assessment involves some sort of business impact analysis.
A business impact analysis explains the effects of and the severity of the loss of key business functions and/or processes, disregarding of what is responsible for that loss. It doesn’t matter what caused the loss of the business function or process. What counts is to understand the impact of the loss to determine the recovery plan and timeframes to resume operations.
Risk assessments analyze potential threats and vulnerabilities that make up a risk, then assess the likelihood of this risk happening. It also spells out how the business would be affected, what resources and functions would be impacted. This leads to the prioritization — i.e., tiering—of these risks.
It also helps business leaders determine how a specific threat will affect business operations. Essentially, risk assessment identifies potential risks, assesses their severity, and determines the best course of action to mitigate or eliminate them.
When combined, BIA and RA enable a business to focus on the most critical risks or threats based on their likelihood and impact.
What is a Service Level Agreement (SLA) in Business Continuity?
A SLA is an agreement between a customer and a service provider/vendor delineating a set of services, expected performance metrics, and responsiveness levels. The agreement could also be internal—between two different departments within an organization.
Within the context of business continuity, a SLA represents a promise about how long a business process or function will remain unavailable in the event of a disruption and assume the commitment of every party involved.
Through the BIA, an organization will estimate the downtime it can tolerate for a given process or function. This will be reflected in the SLA for that process.
When considering IT systems, a SLA helps organizations conduct high-level risk assessments by detailing the requirements for availability, reliability, and the acceptable number of outages for the service provided. The SLA may specify a service requirement of 99.5% guaranteed uptime and allow a maximum of two outage events per year, lasting not more than three hours each. The SLA may also specify remediation measures or recovery protocols depending on the peculiarities of the outage event.
Such a SLA will help DR teams determine if the quality of infrastructure can reliably support expected service levels and be redundant enough to adequately facilitate recovery strategies.
DR teams can extract or infer recovery time objectives (RTOs) and recovery point objectives (RPOs) that will meet the SLA requirement. This will help drive desirable outcomes for the rest of the business continuity and disaster recovery planning process.
SLA Metrics for Business Continuity: MTD and MTDL
Maximum Tolerable Downtime (MTD)
Maximum tolerable downtime, also referred to as maximum allowable downtime (MAD), is the absolute longest amount of downtime an organization can tolerate before facing serious repercussions.
The BIA part of the business continuity planning is most likely where the MTD will be defined for critical business processes. This will be a key element of the SLA associated with a particular business process.
Obviously, the more critical a business process will be to sustain specific business functions, the shorter the maximum tolerable time will be.
Breakdown of MTD:
MTD is the result of Incident response time (IRT) plus RTO plus the operational resumption time (ORT) needed to get the business function up and running at an acceptable level.
- IRT represents the time that spans from the actual point in time when the disruption occurs to when the disaster is declared and the disaster recovery plan is activated. The speed at which a disruption is detected will depend on how effective the monitoring or alert system (tools and their configurations) is. How fast it will be correctly assessed and the disaster recovery plan activated will depend on staff training and the incident response workflows involved.
- RTO relates to the time required to get the systems, or the technology infrastructure, back to a normal or acceptable level of operation.
- ORT is about getting all the other activities that need to take place to resume the business function once all the systems are up and running. This is likely to involve the return of employees (to the office or through remote connection) and re-activation of all the linkages between people involved in running the business function. But it can also include tasks required to get the applications running properly (re-entry or re-sync of data).
Maximum Tolerable Data Loss (MTDL)
Similar to MTD, activities during the business continuity planning will also determine the most amount of data or transactions the business can stand to lose over a specific business process or function. This will directly inform the DR team about RPO to achieve to meet the SLA.
SLA Metrics for Disaster Recovery: RPO and RTO
Your Recovery Time Objective and Recovery Point Objective are key parameters that define how much data loss your organization can tolerate as well as how long your IT business operations can afford to be offline. Understanding what these parameters are, how they are calculated and the critical role they play in DR planning is essential to your business continuity efforts.
Recovery Point Objectives
An RPO describes an acceptable level of data loss a business process can withstand before it crashes. This is the point in time just before a disruption critically impacts an organization’s core functions and systems. The Business Continuity Management team uses MTDL to determine the time frame within which data must be recovered before the crisis. The DR team set a RPO at least equal to MTDL or better depending on the available or selected solution.
Recovery Time Objectives
As mentioned previously, the RTO is the time to get the systems, or the technology infrastructure, functional and back to an acceptable level of operation after an outage. The DR team uses the RTO to inform the choice of technologies and recovery strategies to return to a pre-event state of functionality.
Linking Risk Assessment, BIA, SLAs, RPO and RTO Together
Regardless of terminology, scope, or criticality, BIA, risk assessment, SLAs, RTO, and RPO are all parameters intended to inform and guide BCDR planning efforts.
How Are Risk Assessments, BIA and SLAs Related?
Essentially, BIA and risk assessment reports help organizations develop achievable business continuity SLAs and mitigation plans. These reports also assist in the calculation of accurate RTO and RPO metrics – key components that are embedded into SLAs.
In the example shown below, as part of business continuity planning, a given business process is being assessed through BIA and RA activities.
On one hand, risk assessment will lead to the implementation of specific mitigation actions against some of the highest risks associated with that business process.
On the other hand, BIA will lead to the establishment of a business SLA including some metrics such as MTD and MTDL. These two business continuity metrics will in turn drive the disaster recovery RPO and RTO metric values that the IT organization will have to achieve to be compliant with the SLA.
However, on the IT side, there is more to be considered, which is part of disaster recovery planning. Indeed, if several IT applications are involved in the considered business process, then an evaluation of the RTOs and RPOs of each individual application will have to be conducted. And any dependency —such as any order over app, VM, database recovery sequence— will also have to be factored in to ensure the resulting RTO and RPO of the overall business process meets compliance.
This, in turn, may lead to the selection of specific technology and recovery solutions that can deliver the required RTOs and RPOs, whose sums will have to be less than or equal to the MTD and MTDL values set in the SLA.
As this example shows, all five parameters are interconnected in a loop that flows seamlessly from one to the other to facilitate the disaster recovery and business continuity planning process.
They help Business Continuity Management teams ascertain the necessary tools, methods, strategies, infrastructure, and resources needed to avoid catastrophic losses and guarantee the survival of your business when an emergency strikes.
Essentially, these parameters will help you systematically identify and manage your risks by providing essential information about all areas of threat, causality, failure, error, omission, and so on. Rather than take a reactive approach to crisis management and disaster recovery, leading organizations leverage SLAs, BIAs, RA as well as RTOs and RPOs to stay ahead of disasters and unplanned outage events.
Introducing Zerto: Delivering RTOs and RPOs to Meet Your Most Stringent SLAs
Zerto is built on a foundation of continuous data protection, providing everything you need for disaster recovery, backup, and cloud mobility to help you streamline business continuity planning efforts across any vertical.
Our best-in-class solution can help you seamlessly achieve the RTO and RPO metrics required to meet your SLA requirements. Zerto brings together everything that’s required to keep your infrastructure protected in a single, simple, and scalable cloud data management and protection solution.
Speak to one of our specialists today to find out Zerto can streamline your business continuity and DR planning.