CONTINGENCY PLANNING

DR/HA & US Shutdown Contingency

Disaster recovery, high availability strategies, and contingency plans for scenarios where US cloud providers terminate service or experience control plane outages—whether through deliberate action, legal order, or technical failure.

Critical Context: The Threat is Real

Under US law (CLOUD Act, FISA 702, IEEPA, Executive Orders), the US government can compel AWS, Azure, GCP, and OCI to terminate services, deny access, or provide data access without notice to foreign governments or entities. This is not theoretical—these powers have been exercised. This document provides contingency plans for immediate service termination scenarios.


Threat Scenarios

Scenario 1: Deliberate Service Termination

Trigger: US government orders cloud providers to terminate services to specific government entities or entire countries (sanctions, trade dispute, political conflict).

Warning Time Impact Probability
Zero to 24 hours Complete loss of compute, storage, network Low but increasing

Historical Precedent

  • Huawei: Immediate termination of US software/hardware services (2019)
  • Russia sanctions: Service termination to sanctioned entities (2022)
  • China export controls: Technology access restrictions (ongoing)

Contingency Response

  1. Immediate: Activate pre-deployed sovereign infrastructure
  2. Within 1 hour: DNS failover to sovereign endpoints
  3. Within 4 hours: Full traffic redirect to sovereign platform
  4. Data: Rely on continuously replicated data (see HA architecture below)

Scenario 2: Control Plane Outage (Technical)

Trigger: Major outage affecting cloud provider control plane (API, management, orchestration). Workloads may continue running but cannot be managed.

Warning Time Impact Probability
Zero (outage starts) Cannot deploy, scale, or modify workloads Moderate (has occurred)

Historical Examples

  • AWS us-east-1 outages (multiple incidents affecting global services)
  • Azure Active Directory outages (authentication failures)
  • GCP global networking incidents

Contingency Response

  1. Immediate: Workloads continue on existing resources (no changes possible)
  2. Within 15 minutes: Activate sovereign standby if outage confirmed major
  3. Traffic: Shift to sovereign platform via DNS/load balancer
  4. Return: Can return to US cloud once restored (if policy allows)

Scenario 3: Data Access/Exfiltration Order

Trigger: US government orders cloud provider to provide access to government data without customer knowledge (FISA 702, National Security Letter).

Warning Time Impact Probability
No warning (gag order) Data compromised without knowledge High (documented occurrence)

Mitigation (Pre-Incident)

This scenario cannot be responded to—it can only be prevented.

  • Move data off US cloud infrastructure entirely
  • Encrypt all data with keys held in sovereign HSMs (US provider cannot decrypt)
  • Minimise data stored on US cloud to non-sensitive operational data only

High Availability Architecture (Pre-Migration)

During the migration period, critical systems must be deployed in active-active or active-standby configuration across US cloud AND sovereign infrastructure, enabling instant failover.

GLOBAL TRAFFIC MANAGEMENT

Sovereign DNS / Global Load Balancer

US CLOUD (Primary)

AWS / Azure / GCP

App Cluster
App Cluster
Database (RDS/Azure SQL)
Object Storage (S3)
SOVEREIGN (Hot Standby)

CloudStack / Sovereign K8s

App Cluster
App Cluster
Database (PostgreSQL HA)
Object Storage (MinIO)
Async Replication + Sync

Recovery Objectives

RTO: 1 hour Recovery Time Objective: Maximum time to restore service
RPO: 15 minutes Recovery Point Objective: Maximum data loss (replication lag)

Continuous Replication Requirements

Data Type Replication Method Lag Target Technology
Databases Logical replication (async) < 5 minutes PostgreSQL logical replication, Debezium CDC
Object Storage Continuous sync < 15 minutes rclone with change detection, MinIO bucket replication
Application State Distributed cache sync < 1 minute Redis Cluster cross-region, Kafka mirroring
Secrets/Config Periodic sync < 1 hour OpenBao replication, GitOps

US Shutdown Incident Response Procedure

T+0: Incident Detected

Trigger: US cloud services inaccessible or termination notice received

  • Automated monitoring alerts on US cloud API failures
  • Manual escalation if external notification received
  • Incident commander assigned immediately
T+5 min: Confirm & Classify

Determine scope and intentionality

  • Confirm outage is real (not monitoring false positive)
  • Check cloud provider status pages
  • Classify: Technical outage vs. deliberate termination
  • If deliberate: Escalate to senior leadership immediately
T+15 min: Failover Decision

Authorise failover to sovereign infrastructure

  • Technical outage expected > 1 hour: Initiate failover
  • Deliberate termination: Initiate failover immediately
  • Notify all jurisdiction coordination centres
T+30 min: Execute Failover

Traffic redirect to sovereign platform

  • Update DNS records (low TTL should be pre-configured)
  • Update global load balancer weights
  • Verify sovereign endpoints accepting traffic
  • Monitor error rates and latency
T+1 hour: Stabilise

Confirm sovereign platform operational

  • All critical services confirmed operational
  • Data integrity verification
  • Capacity scaling as needed
  • Stakeholder communication issued
T+24 hours: Assess Return

Determine if return to US cloud is appropriate (if technical outage)

  • If deliberate termination: No return. Accelerate full migration.
  • If technical outage resolved: Assess risk of return vs. staying on sovereign
  • Recommendation: Use incident to justify permanent sovereign migration

Sovereign DR Site Strategy by Jurisdiction

Jurisdiction Primary Site DR Site Cross-Jurisdiction DR
UK Crown Hosting (Corsham) Crown Hosting (Farnborough) EU (OVHcloud France) - data agreement required
EU Primary varies by member state Secondary within same member state Cross-member-state replication (Gaia-X)
Canada SSC Borden (Ontario) SSC Gatineau (Quebec) UK (bilateral agreement possible)
Australia Canberra DC Sydney/Melbourne Limited (geographic isolation); consider NZ partnership
Cross-Jurisdiction DR Consideration: Data sovereignty requirements may limit cross-border DR for some workloads. Each jurisdiction must define which data classifications can be replicated to partner jurisdictions and under what agreements. GDPR adequacy decisions and bilateral data sharing agreements govern these arrangements.

DR Testing Requirements

Test Type Frequency Scope Success Criteria
Failover Test Monthly Single critical service Failover completed within RTO; no data loss beyond RPO
Full DR Exercise Quarterly All critical services Full service restoration on sovereign platform
US Shutdown Simulation Bi-annually Complete cutover simulation Zero US cloud dependency for 24 hours
Cross-Jurisdiction Test Annually Multi-jurisdiction coordination Coordinated failover across cooperative

Back to Emergency Mobilisation Hub

Next: Intelligence Sharing Framework →