Disaster recovery, high availability strategies, and contingency plans for scenarios where US cloud providers terminate service or experience control plane outages—whether through deliberate action, legal order, or technical failure.
Critical Context: The Threat is Real
Under US law (CLOUD Act, FISA 702, IEEPA, Executive Orders), the US government can compel AWS, Azure, GCP, and OCI to terminate services, deny access, or provide data access without notice to foreign governments or entities. This is not theoretical—these powers have been exercised. This document provides contingency plans for immediate service termination scenarios.
Threat Scenarios
Scenario 1: Deliberate Service Termination
Trigger: US government orders cloud providers to terminate services to specific government entities or entire countries (sanctions, trade dispute, political conflict).
| Warning Time | Impact | Probability |
|---|---|---|
| Zero to 24 hours | Complete loss of compute, storage, network | Low but increasing |
Historical Precedent
- Huawei: Immediate termination of US software/hardware services (2019)
- Russia sanctions: Service termination to sanctioned entities (2022)
- China export controls: Technology access restrictions (ongoing)
Contingency Response
- Immediate: Activate pre-deployed sovereign infrastructure
- Within 1 hour: DNS failover to sovereign endpoints
- Within 4 hours: Full traffic redirect to sovereign platform
- Data: Rely on continuously replicated data (see HA architecture below)
Scenario 2: Control Plane Outage (Technical)
Trigger: Major outage affecting cloud provider control plane (API, management, orchestration). Workloads may continue running but cannot be managed.
| Warning Time | Impact | Probability |
|---|---|---|
| Zero (outage starts) | Cannot deploy, scale, or modify workloads | Moderate (has occurred) |
Historical Examples
- AWS us-east-1 outages (multiple incidents affecting global services)
- Azure Active Directory outages (authentication failures)
- GCP global networking incidents
Contingency Response
- Immediate: Workloads continue on existing resources (no changes possible)
- Within 15 minutes: Activate sovereign standby if outage confirmed major
- Traffic: Shift to sovereign platform via DNS/load balancer
- Return: Can return to US cloud once restored (if policy allows)
Scenario 3: Data Access/Exfiltration Order
Trigger: US government orders cloud provider to provide access to government data without customer knowledge (FISA 702, National Security Letter).
| Warning Time | Impact | Probability |
|---|---|---|
| No warning (gag order) | Data compromised without knowledge | High (documented occurrence) |
Mitigation (Pre-Incident)
This scenario cannot be responded to—it can only be prevented.
- Move data off US cloud infrastructure entirely
- Encrypt all data with keys held in sovereign HSMs (US provider cannot decrypt)
- Minimise data stored on US cloud to non-sensitive operational data only
High Availability Architecture (Pre-Migration)
During the migration period, critical systems must be deployed in active-active or active-standby configuration across US cloud AND sovereign infrastructure, enabling instant failover.
GLOBAL TRAFFIC MANAGEMENT
Sovereign DNS / Global Load Balancer
US CLOUD (Primary)
AWS / Azure / GCP
SOVEREIGN (Hot Standby)
CloudStack / Sovereign K8s
Recovery Objectives
Continuous Replication Requirements
| Data Type | Replication Method | Lag Target | Technology |
|---|---|---|---|
| Databases | Logical replication (async) | < 5 minutes | PostgreSQL logical replication, Debezium CDC |
| Object Storage | Continuous sync | < 15 minutes | rclone with change detection, MinIO bucket replication |
| Application State | Distributed cache sync | < 1 minute | Redis Cluster cross-region, Kafka mirroring |
| Secrets/Config | Periodic sync | < 1 hour | OpenBao replication, GitOps |
US Shutdown Incident Response Procedure
Trigger: US cloud services inaccessible or termination notice received
- Automated monitoring alerts on US cloud API failures
- Manual escalation if external notification received
- Incident commander assigned immediately
Determine scope and intentionality
- Confirm outage is real (not monitoring false positive)
- Check cloud provider status pages
- Classify: Technical outage vs. deliberate termination
- If deliberate: Escalate to senior leadership immediately
Authorise failover to sovereign infrastructure
- Technical outage expected > 1 hour: Initiate failover
- Deliberate termination: Initiate failover immediately
- Notify all jurisdiction coordination centres
Traffic redirect to sovereign platform
- Update DNS records (low TTL should be pre-configured)
- Update global load balancer weights
- Verify sovereign endpoints accepting traffic
- Monitor error rates and latency
Confirm sovereign platform operational
- All critical services confirmed operational
- Data integrity verification
- Capacity scaling as needed
- Stakeholder communication issued
Determine if return to US cloud is appropriate (if technical outage)
- If deliberate termination: No return. Accelerate full migration.
- If technical outage resolved: Assess risk of return vs. staying on sovereign
- Recommendation: Use incident to justify permanent sovereign migration
Sovereign DR Site Strategy by Jurisdiction
| Jurisdiction | Primary Site | DR Site | Cross-Jurisdiction DR |
|---|---|---|---|
| UK | Crown Hosting (Corsham) | Crown Hosting (Farnborough) | EU (OVHcloud France) - data agreement required |
| EU | Primary varies by member state | Secondary within same member state | Cross-member-state replication (Gaia-X) |
| Canada | SSC Borden (Ontario) | SSC Gatineau (Quebec) | UK (bilateral agreement possible) |
| Australia | Canberra DC | Sydney/Melbourne | Limited (geographic isolation); consider NZ partnership |
DR Testing Requirements
| Test Type | Frequency | Scope | Success Criteria |
|---|---|---|---|
| Failover Test | Monthly | Single critical service | Failover completed within RTO; no data loss beyond RPO |
| Full DR Exercise | Quarterly | All critical services | Full service restoration on sovereign platform |
| US Shutdown Simulation | Bi-annually | Complete cutover simulation | Zero US cloud dependency for 24 hours |
| Cross-Jurisdiction Test | Annually | Multi-jurisdiction coordination | Coordinated failover across cooperative |