Explainer

SOC 2 disaster recovery: RTO, RPO, and proving you can recover

Disaster recovery sits under the Availability criteria, where auditors expect a documented plan with defined RTO and RPO targets and, critically, evidence that you tested it. An untested plan is the most common finding.

Disaster recovery under the Availability criteria

Disaster recovery is the technical heart of the Availability category in SOC 2, addressing how you restore systems, infrastructure, and data after a significant outage. It maps most directly to A1.2, which expects recovery infrastructure to be designed and operated, and to A1.3, which expects you to test recovery procedures against your objectives. It also connects to CC7.5 in the Common Criteria, which covers recovering from identified incidents and improving procedures based on what testing reveals. Disaster recovery is narrower than business continuity, which keeps the whole business functioning, and broader than backups, which are one input to recovery. An auditor evaluating Availability will look for a documented disaster recovery plan, defined recovery targets, and proof the plan actually works.

RTO and RPO: the two numbers that drive everything

Recovery time objective, or RTO, is the maximum acceptable time a system can be down before the consequences become unacceptable; recovery point objective, or RPO, is the maximum acceptable amount of data loss measured in time, which dictates how frequently you must back up or replicate. These two targets shape almost every downstream decision, including backup cadence, replication technology, and infrastructure cost. There is an inverse relationship between aggressive targets and cost: a near-zero RPO typically demands continuous replication or mirroring, and a short RTO demands warm standby or automated failover rather than rebuilding from scratch. Critically, SOC 2 does not set these numbers for you. You define them, ideally informed by a business impact analysis, and the auditor then checks whether your recovery capability and test results are consistent with the targets you committed to.

Failover, redundancy, and cloud architecture

How you meet your recovery targets is an architectural decision, and modern cloud platforms make several patterns practical. Multi-availability-zone deployments protect against the loss of a single data center, while multi-region or cross-account replication protects against a larger regional failure and supports more demanding recovery objectives. Managed database services often offer automated failover and point-in-time recovery, which can satisfy tighter RPOs without heavy custom engineering. The important nuance is the shared responsibility model: a cloud provider keeps its infrastructure resilient, but configuring redundancy, replication, and failover for your specific workload remains your responsibility. Auditors will want to understand the architecture well enough to judge whether it can plausibly deliver the RTO and RPO you have documented, so the plan should describe the actual recovery mechanism rather than gesturing at the cloud.

How disaster recovery relates to backups and continuity

These three concepts are layered, and confusing them leads to gaps. Backups are the protected copies of data that make recovery possible; disaster recovery is the plan and infrastructure that uses those copies to restore systems within your targets; business continuity is the wider effort to keep the organization operating, with disaster recovery as its technical engine. A backup with no disaster recovery plan leaves you with data but no defined path to a running system, while a disaster recovery plan with unreliable backups is built on sand. Recovery objectives tie the layers together: your RPO sets how often backups or replication must occur, and your RTO sets how fast the recovery procedure must complete. Auditors expect to see this coherence, where the backup cadence, the recovery design, and the stated objectives all agree with one another.

What auditors test and the evidence they want

An auditor assessing disaster recovery wants two things: that the plan is documented and that it has been tested within the audit period. The plan should define recovery objectives, name the systems in scope, describe the recovery steps and the failover mechanism, assign roles, and reference the backups it depends on. The testing evidence is where reports most often fall short, so auditors look for disaster recovery test records that capture the scenario, the date, who participated, the systems recovered, the actual recovery time and data loss measured against your RTO and RPO, the pass or fail outcome, and the corrective actions taken. For a Type 2 examination they need to see this test occurred during the observation window. A plan reviewed within the last twelve months and a documented test that fed improvements back in is the combination that satisfies A1.3.

Common gaps and practical guidance

The dominant finding is an untested plan, a polished document describing failover that nobody has ever exercised, which gives an auditor no basis to believe it works. Other frequent gaps include recovery objectives that exist on paper but are wildly inconsistent with the actual architecture, no measurement of real recovery time against the stated RTO during tests, and a plan that has gone stale as the infrastructure evolved underneath it. The practical approach is to set RTO and RPO targets you can defend, build redundancy and failover that can realistically meet them, and then schedule at least an annual recovery test that measures actual performance and records the results. Where a full live failover is too disruptive, a structured tabletop or a partial restore in a non-production environment is acceptable as long as it is documented and honest about what was and was not exercised.