DR 101: Recovery Point Objectives (RPOs) — Definition and Drivers (Updated in June 2023)
Data has become as important a resource as infrastructure and physical assets are to organizations. Data loss can cost an organization thousands or even millions of dollars. To minimize data loss, organizations turn to disaster recovery solutions, which focus on recovery point objectives recovering as much data as possible during a disaster. In this blog we’ll discuss recovery point objectives in depth, but we can begin with a very simple definition:
A recovery point objective is the point in time you would like to restore to in the event of a disaster.
But there is more to it. Let’s investigate what a recovery point objective (RPO) really means and how it adds value to a disaster recovery plan.
RPO: A Goal for Minimizing Data Loss
An RPO is, as the part “objective” implies, a goal for having minimal data loss in a disaster scenario. It is defined by a service-level agreement (SLA), which exists for internal or external customers of key data and systems. RPOs may also be defined by regulatory requirements for certain industries and governmental organizations.
RPOs are one of the measurements of disaster recovery effectiveness. While an RPO goal may be defined in your disaster recovery plans, the RPO you can actually achieve is determined by the disaster recovery tools you have in place and your ability to use them effectively. RPOs must be measured frequently to ensure you are meeting or surpassing your goals so that when disaster strikes, data can be recovered with an acceptable amount of loss.
Measuring the Amount of Acceptable Loss
RPOs are expressed as units of time, from days to minutes or seconds. Rather than measuring data itself, RPOs measure the time between the moment of data loss and the last point in time from which data can be recovered. Here’s a scenario:
ACME Corporation backs up their data every 12 hours, at 6 a.m. and 6 p.m. daily. If ACME Corporation experienced a disaster in which data was lost at 2 p.m., then the nearest point in time from which they could recover—the 6 a.m. backup—would be 8 hours prior to the disaster. If they recovered to that point, they would lose 8 hours of data. However, this does not mean that ACME Corporation has an RPO of 8 hours.
If the disaster instead hit at 5:59 p.m., just before the next backup was scheduled, the nearest recovery point would still be 6 a.m. Now the data loss becomes 11 hours and 59 minutes or, rounding up, 12 hours. A 12-hour RPO is the minimum ACME’s disaster recovery plan can achieve because the two recovery points are 12 hours apart. Even this measurement assumes that the nearest recovery point is reliable—if for some reason it isn’t, then an earlier recovery point must be used, increasing the RPO further.
The amount of data that ACME Corporation might lose over that 12-hour period is variable based on the amount of data they are creating during that time. In our modern digital world, the value of data can vary, but the business impact of data loss does not. Data loss causes lost productivity and intellectual property, damaged reputation, and even regulatory fines. If ACME Corporation determines that 12 hours of data loss is acceptable, then this disaster recovery plan is good for them.
Tiering RPOs for Different Data
The reality is that no organization can afford 12 hours of business-critical data loss. Instead, organizations are looking to solutions that can provide an RPO of minutes or even seconds for their most critical data.
For less critical data, such as data on internal test/dev systems, a different RPO that defines a longer period might be acceptable. Therefore, many organizations tier data based on business criticality: a lower RPO (with time measured in minutes or seconds) for some data and a higher RPO (with time measured in hours or even days) on other data. By defining multiple RPOs, organizations can save money with low-cost backup solutions for less critical data and low-RPO disaster recovery solutions for business-critical data.
Main Drivers for RPOs
The RPOs you can achieve will mostly be determined by the technology solutions you have or plan to have in place for disaster recovery and backup. Bandwidth and quality of service (QoS) may also play a role, especially if you want to achieve aggressive RPOs. Here’s how to achieve or surpass your RPOs.
Your Technology Solution
There are many kinds of backup and disaster recovery solutions to choose from, and each of them achieve RPOs differently.
A traditional backup solution, even one based on snapshot technologies, periodically creates different backups of applications, data, and even entire virtual machines. Although snapshots themselves can theoretically be taken frequently, they are usually taken every few hours because taking snapshots negatively impacts performance. For this reason, backup solutions are really only suitable for long-term backup retention or protecting low-tier data in a disaster scenario.
Replication solutions can achieve low RPOs through synchronous replication. Synchronous replication is very limited in geographic range and cost-prohibitive to achieve, but it can achieve near-zero RPOs when implemented properly. Replication also includes snapshot-based technologies that are unable to achieve RPOs of seconds or minutes. The RPOs a given replication solution can achieve in real implementations may vary.
• Continuous Data Protection
Generally, continuous data protection (CDP) creates many recovery points in a short time so data can be recovered from any point as needed. The success of a CDP solution depends on its design, but the core principle of multiple recent recovery points can help ensure low RPOs. When the most recent recovery point cannot be used successfully, the next recovery point may be used.
• Application-Centric Protection
File and folder data is relatively simple to protect and recover, but applications are far more complex. An application can span multiple virtual machines or containers and rely on data stored in multiple locations to function properly. Recovering an application to a specific point in time requires a consistency in protection that is hard for most replication solutions to achieve. It is important to consider how often a replication solution can create a consistent recovery point from which an application can recover.
Bandwidth is not unlimited. Your RPOs can only be measured after the data has been received at the recovery site and is fully available for recovery. While real-time replication helps achieve low RPOs, bandwidth can be an issue during high rates of data change where replication is moving larger amounts of data. Latency and network disruptions may also affect available bandwidth.
Using QoS may further restrict the bandwidth available to replication, and this can impact RPOs. You should consider the importance of RPOs relative to other network processes when configuring your QoS. By planning ahead, you can ensure that bandwidth is available to achieve the RPOs you need.
Testing and Reporting
If you’ve recovered before, then you probably know that no recovery is guaranteed. And if your first recovery attempt is not successful, then you must recover from the next available recovery point. With backup solutions, these can be hours apart. But with continuous data protection solutions, these can be seconds apart.
The only way to know is through testing, which is one of best ways to ensure you can achieve your RPOs. Testing your recovery not just before but also during a disaster ensures you have good recovery data, especially in a cyberattack.
Testing also generates reporting. With reports, you can identify and address any issues where RPOs are not meeting requirements. Reporting may also satisfy stakeholder requirements for regulatory compliance. Regular testing and reporting ensure that your RPOs are meeting targets when disaster strikes.
Achieving the Lowest RPOs in the Industry with Zerto
Lowering RPOs and RTOs to within a few seconds, at scale and across sites, is not only possible but also simple with Zerto. Zerto consistently achieves the lowest RTOs and RPOs in the industry to help businesses become ransomware resilient. Using its own CDP technology, that combines real-time replication, recovery point journaling, and application-centric protection, Zerto adapts your organization’s disaster recovery strategy to keep RPOs as low as possible.
Increase your knowledge about DR with our Disaster Recovery Guide!