Service Level Agreement in Business Continuity
A-to-Zerto Glossary of Terms
In the context of business continuity, a service level agreement (SLA) defines the maximum amount of time a business process will remain unavailable in the event of a disruption. It also defines actions and KPIs the teams involved in the recovery effort will have to take and achieve.
What Is a Service Level Agreement ?
An SLA is a critical component of most technology-facing vendor contracts. It is an agreement between a customer and a service provider/vendor that delineates a set of services, expected performance metrics, and responsiveness levels. The agreement could also be internal—between two different departments within an organization.
What Is Included in a Service Level Agreement?
A comprehensive SLA usually includes:
- The specifics of services provided (and any exclusions)
- Standards such as the time window for each level of service (including peak periods and non-peak periods)
- Conditions of service availability
- Responsibilities of all parties
- Escalation procedures
- Cost/service tradeoffs.
The SLA should also include a mechanism for updating the agreement as required (due to changes in vendor capabilities and service requirements), a dispute resolution process, standards and methods for measuring service levels, protocols for initiating litigation, reporting processes, indemnification clauses, etc.
Why Is an SLA Important?
An SLA explicitly stipulates the level of service expected from internal or external sources by laying out the relevant metrics and key performance indicators (KPIs) that will be used to gauge the level of service delivered. Not only does an SLA provide expectations of the extent and quality of service delivery, but it also provides applicable penalties and possible remedies when requirements aren’t met. As such, an SLA acts as a scorecard, criteria, and a valuable insurance policy that can support and accelerate business continuity initiatives.
What Is an SLA in Business Continuity?
Within the context of business continuity, an SLA represents a promise about how long a business process or function will remain unavailable in the event of a disruption. It assumes the commitment of every party involved.
Through the business impact analysis (BIA), an organization will estimate the downtime it can tolerate for a given process or function. These limits are reflected in the SLA.
When considering IT systems, an SLA helps organizations conduct high-level risk assessments by detailing the requirements for availability, reliability, and the acceptable number of outages for the service provided. The SLA may specify a service requirement of 99.5% guaranteed uptime and allow a maximum of two outage events per year, lasting not more than three hours each. The SLA may also specify remediation measures or recovery protocols depending on the peculiarities of the outage event.
Such an SLA will help disaster recovery (DR) teams determine if the infrastructure’s quality can reliably support expected service levels and be redundant enough to adequately facilitate recovery strategies.
DR teams can extract or infer recovery time objectives (RTOs) and recovery point objectives (RPOs) that will meet the SLA requirement. This will help drive desirable outcomes for the rest of the business continuity and DR planning process.
SLA Metrics for Business Continuity
Maximum tolerable downtime and maximum tolerable data loss are two of the most important metrics of any business continuity plan. For any given business process, staying within these two target metrics is key to business continuity. When these metrics aren’t met, a business is likely to suffer negative impacts to operability, revenue, and brand reputation.
- Maximum Tolerable Downtime (MTD)
Maximum tolerable downtime, also referred to as maximum allowable downtime (MAD), is the longest downtime an organization can tolerate before facing serious repercussions.
The MTD for critical business processes is defined during the BIA. This will be a key element of the SLA associated with a particular business process.
The more necessary a business process is to sustain specific business functions, the shorter the MTD will be.
- Maximum Tolerable Data Loss (MTDL)
Similar to determining MTD, activities during business continuity planning also determine the most amount of data or transactions the business can afford to lose over a specific business process or function. This limit is the maximum tolerable data loss, measured in units of time. MTDL will directly inform the DR team about the RPO required to meet the SLA.
A Solution to Meet Your Most Stringent Business Continuity SLAs
Zerto, a Hewlett Packard Enterprise company, understands that unplanned disruptions do not affect just IT operations—they have a domino effect on an entire organization. As a BIA will show, an organization’s reliance on technology to maintain operations and remain visible to the world steadily increases as it expands and grows.
Zerto enables an always-on experience that transforms business-as-usual, helping organizations realize their innovation goals. It ensures that IT systems remain resilient through the identified potential disruptions and can deliver RPOs and RTOs that meet stringent SLAs featuring the shortest MTDs and MTDLs.
Fastest RTOs and RPOs in the Industry with Zerto CDP
Buisness Continuity Plan (BCP) vs. Disaster Recovery Plan (DRP): What Are the Key Differences?
Essential Guide: Business Continuity
Get everything, from the definition of business continuity and its related plans, to the description of the planning involved in establishing the business continuity plan, right down to its management.