RPO & RTO. What about RTPO?
You know these:
- RPO – recovery point objective
- RTO – recovery time objective
But what is RTPO?
- RTPO – Return to Production Objective
Everyone needs to consider the time & effort it will take to fail-back to their production equipment, location, and systems. Yes I made it up, RTPO. I didn’t make up the need though, just the abbreviation.
Unfortunately, not every solution or the way they are implemented takes into account the effort or time it will take to return to production environments.
You don’t want to go through a disaster and then turn around weeks later and do all the work over again to Return to Production systems & locations! It’s like scheduling a second disaster. Consider your RTPO when setting up your DR strategy.
When I help with storage solutions, Disaster Recovery (DR) is always part of the conversation so here is a chart covering at a high level the options for DR.
Scale 0-5 with lower being better
(just a sampling)
|1||1||4||Replication can rarely be synchronous writes because of latency and distance between production and DR sites. So data is never 100% up to day but even with asynchronous writes or snapshots the time to recover and recovery point are usually very quick.
The concern is returning to production. Traditional solutions, depending on the type of issue or length of running out of the DR site can cause a full re-sync of data to get back to production. Newer solutions and improved options do exist so only change deltas are synced back to the production site. Know what your SAN can do to get back to productions!
· There are many, many SAN providers.
|Hypervisor replication / DR tools
|1||1||3||Hypervisor vendors figured out long ago how to do high availability in a shared storage environment and low latency situation like one data center. Current day tools, with the right licensing can extend many of those capabilities to a DR site. The tools, licensing, and exact abilities vary with each hypervisor & may also be based on your SAN/Storage replication ability.
Your return to production depends on the various components for your hypervisor and SAN. Be sure to know your Hypervisor and your SAN’s capabilities if using Hypervisor replication tools.
· Citrix Zen Server
· KVM – various providers
|VM Replication tools at the Application layer||2||1||2||Replication tools of this type usually interact at the VM or Hypervisor level. They don’t rely on specific SAN or Hypervisor tools (some do need Hypervisor APIs) to replicate your data. The advantage is that the secondary site & technology doesn’t need to look exactly like production. You may be using older or slower but still capable storage & VM hosts. You could also use a hybrid cloud solution so you don’t own any of the infrastructure at the DR site. An extra benefit is that you should find these tools to be less complex than putting together SAN/hypervisor replication setups as they are designed to be one application for replication and recovery.
Return to production is done with delta changes and done on a per VM or VM group level not a full SAN or site level.
|Backup software / appliances
|4||4||4||Traditional backup solutions have the advantage of long term retention and many recovery points. Unfortunately, they often create a longer recovery time as data has to be decrypted, copied, and/or restored from a 3rd party medium (HD or tape). Some solutions do have rapid recovery options available that provide immediate temporary servers (read-only VMs with data change logs) or even delta restores that overwrite the production VM. These advancements are not consistent across the vendors so know your options.
In general, these backup tools often take longer to restore and eventually require another restore back to production equipment if the ‘quick’ temp restore or recovery options were used.
|Application/DB specific tools||1||1||2||These tools are designed right into specific application or platforms that require high availability like databases and mail servers. When DB clusters, DAG groups, or built in application replication is used for HA solutions it provides very quick recovery and by design an easy return to production environments.
To return to production these systems often try to first replicate data changes and only if necessary reseed a recent backup and then replicate only recent data changes.
|· SQL Always on Clusters
· Distributed file System (DFS)
· Exchange DAG groups
· Oracle clusters & replication options.
I am not recommending one type of solution over another. The ‘right’ solution or mix of solutions depends on your risk, architecture, skillsets, and budget. I’ve helped set up all types of DR solutions and many clients have a combination of these solutions to meet their specific business needs or regulations.
I’ve experienced a RTP turning into a scheduled outage and redoing all the ‘disaster’ work just to move back to production systems. Design and choose your DR strategy carefully & remember the Return to Production Objective is as important as RTO & RPO.
LostCreek Fintech handles all aspects of storage solutions and has partners who are experts in multiple disciplines to help our clients with other IT needs. Contact us at firstname.lastname@example.org for more info.