Transition Management: Coexistence Strategies

Audience: Programme Directors, Enterprise Architects, Migration Teams, Procurement
Purpose: Managing the transition period where systems operate across both US hyperscale and sovereign infrastructure

Large-scale cloud migrations do not happen overnight. Government departments will operate in a hybrid state for 2-5 years, with systems running simultaneously on US cloud providers and sovereign infrastructure. This page addresses the practical realities of this coexistence period.

Reality Check: The average enterprise cloud migration takes 3-5 years. Government systems—with their additional compliance requirements, legacy integrations, and procurement constraints—may take longer. Planning for extended coexistence is not defeatism; it's realism.

1. Projects In-Flight Decision Framework

At any given moment, government departments have dozens of cloud projects at various stages: procurement, development, testing, or recently launched. Each requires a decision about how to proceed.

Project Status Categories

Category	Definition	Typical Examples
Pre-Procurement	Requirements defined, no contracts signed	New citizen services, modernisation initiatives
In Procurement	ITT issued or contract negotiations underway	Major transformation programmes
In Development	Actively being built, not yet live	Digital services in beta
Recently Launched	Live <12 months, still stabilising	New platforms, API services
Established	Live >12 months, stable operation	Core departmental systems
Legacy/End-of-Life	Scheduled for retirement within 24 months	Systems being replaced

Decision Matrix: What To Do With Each Project

Pre-Procurement Projects

REDIRECT to sovereign infrastructure

Update requirements to mandate sovereign-compatible architecture
Add sovereign cloud deployment as primary target environment
Require open standards (Kubernetes, S3-compatible, PostgreSQL)
No additional cost if done before procurement

In Procurement Projects

PAUSE & ASSESS

If ITT not yet issued: Add sovereign requirements, may delay 2-4 weeks
If in evaluation: Score sovereign-readiness as weighted criterion
If in contract negotiation: Add migration clause and exit provisions
Cost: £50k-200k in delays, but avoids £2-10M migration later

In Development Projects

ASSESS ARCHITECTURE

Cloud-agnostic architecture: Continue, plan sovereign deployment post-launch
Light proprietary services: Continue with migration plan for specific services
Deep proprietary lock-in: Pause and re-architect if <40% complete
Near completion: Launch on US cloud, immediate migration planning

Recently Launched Projects (<12 months)

STABILISE THEN MIGRATE

Do not migrate during stabilisation period (creates additional risk)
Begin migration planning and architecture assessment immediately
Target migration window: 6-18 months post-launch
Add telemetry to understand actual usage patterns for migration planning

Established Systems

PRIORITISE BY RISK

Classify by data sensitivity and criticality
High sensitivity + high criticality = Priority 1 migration
Schedule in migration waves per overall programme
May operate in hybrid state for extended period

Legacy/End-of-Life Systems

DO NOT MIGRATE

Continue on current platform until retirement
Ensure replacement system targets sovereign infrastructure
Exception: If retirement date slips past 24 months, reassess
Maintain enhanced monitoring for security incidents

2. Extended Parallel Operation Patterns

For complex systems, the transition period may extend 12-36 months. During this time, the system operates in both environments simultaneously. This is not a bug—it's a feature that reduces risk and enables gradual confidence building.

Parallel Operation Models

Model A: Read Replica

Low Risk

US cloud remains primary. Sovereign infrastructure receives read-only replica of data. Used for reporting, analytics, and building operational confidence.

One-way data flow (US → Sovereign)
No sovereignty benefit until cutover
Lowest risk, easiest rollback
Good for: Initial validation phase

Model B: Traffic Split

Medium Risk

Both environments serve live traffic. Percentage gradually shifts from US to sovereign. Both write to their own data stores with reconciliation.

Requires robust load balancing
Data reconciliation complexity
Partial sovereignty benefit during transition
Good for: Stateless services, APIs

Model C: Active-Active

High Complexity

Both environments are fully operational with bidirectional data synchronisation. Either can serve any request. True multi-cloud operation.

Complex conflict resolution required
Highest operational overhead
Maximum resilience during transition
Good for: Critical 24/7 services

Model D: Strangler Fig

Recommended

New features built on sovereign. Existing features migrated incrementally. Old system gradually "strangled" as functionality moves.

No big-bang cutover
Each component migrates independently
Can take 2-3 years for complex systems
Good for: Monolithic applications

Extended Parallel Operation Timeline

MONTH    1    3    6    9    12   18   24   30   36
         │    │    │    │    │    │    │    │    │
US CLOUD ████████████████████████████████░░░░░░░░░░  (100% → 0%)
         │    │    │    │    │    │    │    │    │
SOVEREIGN ░░░░░░░░░░░░░░████████████████████████████  (0% → 100%)
         │    │    │    │    │    │    │    │    │
         │    │    │    │    │    │    │    │    │
PHASE:   │PREP│PILOT    │RAMP-UP    │PRIMARY      │COMPLETE
         │    │         │           │             │
DATA:    │    │ Read    │ Bi-dir    │ Sovereign   │ US
         │    │ Replica │ Sync      │ Primary     │ Decomm

Key Principle: The sovereign environment should be capable of running 100% of traffic before any cutover begins. The parallel period is for building confidence and validating operations—not for completing the technical migration.

3. Data Synchronisation Strategies

Maintaining data consistency across two cloud environments is the most technically challenging aspect of extended parallel operation. The strategy depends on data characteristics and consistency requirements.

Synchronisation Patterns

Pattern	Latency	Consistency	Complexity	Use Case
Change Data Capture (CDC)	Seconds	Eventual	Medium	Database replication
Event Sourcing	Seconds	Eventual	High	Event-driven systems
Dual-Write	Milliseconds	Strong (if sync)	Very High	Critical transactions
Batch Sync	Hours	Point-in-time	Low	Analytics, reporting
Message Queue	Seconds	At-least-once	Medium	Async workflows

Change Data Capture (CDC) Implementation

CDC is the recommended pattern for most database synchronisation scenarios. It captures changes at the database level and streams them to the target environment.

# Example: PostgreSQL CDC with Debezium to sovereign infrastructure

# 1. Source Database (AWS RDS) - Enable logical replication
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_replication_slots = 4;
ALTER SYSTEM SET max_wal_senders = 4;

# 2. Debezium Connector Configuration (runs in sovereign Kubernetes)
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: cdc-source-connector
spec:
  class: io.debezium.connector.postgresql.PostgresConnector
  config:
    database.hostname: source-db.xxx.eu-west-2.rds.amazonaws.com
    database.port: 5432
    database.user: cdc_user
    database.password: ${CDC_PASSWORD}
    database.dbname: production
    database.server.name: aws-source
    plugin.name: pgoutput
    slot.name: debezium_slot
    publication.name: dbz_publication
    # Route through secure tunnel - NOT public internet
    database.sslmode: verify-full

# 3. Sink Connector (writes to sovereign PostgreSQL)
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: cdc-sink-connector
spec:
  class: io.confluent.connect.jdbc.JdbcSinkConnector
  config:
    connection.url: jdbc:postgresql://sovereign-db:5432/production
    connection.user: app_user
    topics.regex: aws-source.*
    insert.mode: upsert
    pk.mode: record_key
    auto.create: false
    auto.evolve: false

Conflict Resolution for Bidirectional Sync

When both environments can write, conflicts will occur. Define resolution rules upfront:

Conflict Type	Resolution Strategy	Example
Simultaneous update	Last-write-wins with vector clock	User profile updates
Delete vs update	Delete wins (or soft-delete only)	Record removal
Constraint violation	Reject and alert, manual resolution	Unique key conflict
Schema mismatch	Queue for review, do not auto-apply	New column in one env

Critical Warning: Bidirectional synchronisation with strong consistency across geographic regions and cloud providers is extremely complex. Consider whether you truly need it, or whether a simpler model (sovereign-primary with US read-replica) would suffice during transition.

4. Contract & Commercial Management

Government departments have existing contractual commitments with AWS, Azure, and GCP—often multi-year enterprise agreements with committed spend. The transition must account for these commercial realities.

Contract Situation Assessment

Contract Type	Typical Terms	Exit Considerations
Enterprise Discount Programme (EDP)	3-5 years, committed annual spend	Early termination penalties; negotiate wind-down
Reserved Instances	1-3 years, specific capacity	Non-refundable; use until expiry or sell on marketplace
Savings Plans	1-3 years, flexible capacity	Use for remaining workloads; cannot transfer
G-Cloud Call-offs	Up to 24 months per call-off	Standard termination clauses; 30-90 day notice
Direct Award	Variable	Review specific terms; may have break clauses

Commercial Transition Strategies

Strategy 1: Run Down Commitments

Continue paying committed spend while migrating workloads. Use remaining capacity for non-sensitive workloads, dev/test, or disaster recovery until commitment expires.

Pros: No penalty payments; maintains vendor relationship
Cons: Continued dependency; dual running costs
Timeline: Aligned to contract expiry (1-5 years)

Strategy 2: Negotiate Early Exit

Approach vendor to negotiate termination. May involve paying portion of remaining commitment (typically 50-80%) in exchange for immediate release.

Pros: Clean break; faster transition
Cons: Significant one-time cost; difficult negotiation
When: Emergency scenario; strategic imperative

Strategy 3: Renegotiate Terms

Use upcoming renewal as leverage to negotiate flexibility. Add migration clauses, reduce committed spend, or convert to pay-as-you-go for new workloads.

Pros: No immediate cost; improved terms
Cons: Requires negotiating leverage; vendor may resist
When: 6-12 months before renewal

Data Egress Cost Planning

Cloud providers charge for data leaving their networks. At government scale, egress costs can be substantial:

Data Volume	AWS Egress Cost	Azure Egress Cost	GCP Egress Cost
100 TB	~$8,500	~$8,500	~$8,000
1 PB	~$50,000	~$50,000	~$45,000
10 PB	~$250,000	~$250,000	~$200,000

Cost Mitigation: For very large data migrations, consider AWS Snowball/Azure Data Box for physical transfer (avoids egress charges), or negotiate egress fee waivers as part of contract exit discussions.

5. Industry Lessons Learned

Several major organisations have undertaken large-scale cloud migrations or repatriations. Their experiences provide valuable lessons.

Dropbox: AWS to Private Infrastructure (2016-2018)

Scale: ~500 PB of data, serving 500M users

Duration: 2.5 years

Approach:

Built custom infrastructure called "Magic Pocket" while still on AWS
Ran in parallel for 18+ months before cutover
Migrated metadata first, then gradually shifted block storage
Kept some services on AWS (non-core functionality)

Key Lessons:

Extended parallel operation is essential—Dropbox ran dual for nearly 2 years
Build the destination fully before starting migration
Migrate in order of increasing criticality (test with less critical first)
Savings of ~$75M over 2 years justified the investment

37signals (Basecamp/Hey): AWS to Private Cloud (2022-2023)

Scale: ~$3.2M annual AWS spend, tens of servers

Duration: ~18 months planning to completion

Approach:

Purchased physical servers, colocated in datacentres
Used Kubernetes (k8s) for container orchestration
Migrated application-by-application over several months
Maintained AWS for specific services (S3 for some assets)

Key Lessons:

Smaller scale made "big bang" per-application feasible
5-year payback on hardware investment
Operational complexity increased—needed more in-house expertise
Some hybrid state may be permanent (pragmatic approach)

Capital One: Data Centre to AWS (2012-2020)

Scale: Large US bank, 1000+ applications

Duration: 8 years (full exit from data centres)

Approach:

Started with non-critical workloads in 2012
Gradually moved more sensitive workloads as confidence grew
Closed last data centre in 2020
Heavy investment in cloud-native transformation (not lift-and-shift)

Key Lessons (Reverse-applicable):

8-year timeline for complete migration—government should plan similarly
Regulatory complexity (banking) extended timelines significantly
Cultural change was as important as technical migration
Some applications were retired rather than migrated

Danish Government: Microsoft to Open Source (2017-Ongoing)

Scale: National government IT infrastructure

Duration: Ongoing, multi-year programme

Approach:

Phased replacement of Microsoft Office with LibreOffice
Migration of email systems to open platforms
Development of shared open-source components
Parallel operation during extended transition

Key Lessons:

User training and change management as important as technology
Document format compatibility requires long parallel period
Departmental autonomy created inconsistent adoption
Central mandate with local flexibility worked best

6. Hybrid Steady-State: Systems That May Never Fully Migrate

Some systems may remain on US cloud infrastructure indefinitely due to technical, commercial, or practical constraints. This is acceptable if properly managed.

Candidates for Permanent Hybrid State

Category	Examples	Rationale	Mitigation
Deep Vendor Lock-in	Systems using AWS Lambda extensively, Azure Cosmos DB, GCP BigQuery	Refactoring cost exceeds benefit; 2-5 year rewrite required	Scheduled replacement with sovereign-native; enhanced monitoring
Third-Party SaaS	Salesforce, ServiceNow, Workday (hosted on US cloud)	Vendor choice, not government's; no sovereign equivalent	Data minimisation; API abstraction layer; evaluate alternatives at renewal
External Integration	Systems that must integrate with US-based partners	Partner systems are on US cloud; latency requirements	Gateway/proxy architecture; data classification review
Niche Services	Specialised AI/ML services, specific compliance tools	No sovereign equivalent exists or is immature	Isolate sensitive data; use for processing only, not storage
End-of-Life Systems	Legacy applications scheduled for retirement	Migration investment not justified for remaining lifespan	Enhanced security monitoring; accelerate replacement if possible

Managing Permanent Hybrid State

Acceptable Hybrid Criteria:

System does not process Tier 1 (TOP SECRET) or Tier 2 (SECRET) data
System is not critical national infrastructure
Data can be reconstituted from sovereign sources if access is lost
Business impact of 72-hour outage is manageable
System is documented in risk register with ministerial acceptance

Hybrid Architecture Pattern

SOVEREIGN INFRASTRUCTURE

Core Systems

(Kubernetes)

Citizen Data

(PostgreSQL)

Sensitive Processing

(Isolated)

API Gateway (Kong/APISIX)

All external traffic routes here

▼

Secure Tunnel (WireGuard/IPsec)

Encrypted, logged, monitored

▼

US CLOUD (Residual)

Proxy/Cache

No direct citizen access

Legacy App A

(Locked-in)

SaaS Integration

(Salesforce)

ML Processing

(Non-sensitive)

Constraints: No citizen PII • No classified data • Logged access

7. Operational Considerations During Transition

Monitoring & Observability

During coexistence, unified monitoring across both environments is essential:

Single pane of glass: Grafana dashboards showing both environments
Unified alerting: PagerDuty/Opsgenie routing regardless of source
Distributed tracing: Jaeger/Tempo tracing requests across environments
Log aggregation: All logs to sovereign infrastructure (even from US cloud)
Synthetic monitoring: External probes testing both environments

Incident Response

Scenario	Response
US cloud component fails	Route traffic to sovereign if available; standard incident process
Sovereign component fails	Route traffic to US cloud; investigate root cause; no different from any failover
Data sync failure	Alert immediately; assess data divergence; may need to pause writes
US cloud access revoked	Execute emergency cutover plan; accept data loss from last sync point
Security incident in US cloud	Isolate immediately; do not replicate potentially compromised data

Team Structure During Transition

Recommendation: Establish a dedicated "Migration Ops" team responsible for the coexistence infrastructure. This team owns the sync mechanisms, monitoring, and cutover procedures—separate from teams running either environment day-to-day.

Cookies on Sovereign Cloud Architecture

Transition Management: Coexistence Strategies

1. Projects In-Flight Decision Framework

Project Status Categories

Decision Matrix: What To Do With Each Project

Pre-Procurement Projects

In Procurement Projects

In Development Projects

Recently Launched Projects (<12 months)

Established Systems

Legacy/End-of-Life Systems

2. Extended Parallel Operation Patterns

Parallel Operation Models

Model A: Read Replica

Model B: Traffic Split

Model C: Active-Active

Model D: Strangler Fig

Extended Parallel Operation Timeline

3. Data Synchronisation Strategies

Synchronisation Patterns

Change Data Capture (CDC) Implementation

Conflict Resolution for Bidirectional Sync

4. Contract & Commercial Management

Contract Situation Assessment

Commercial Transition Strategies

Strategy 1: Run Down Commitments

Strategy 2: Negotiate Early Exit

Strategy 3: Renegotiate Terms

Data Egress Cost Planning

5. Industry Lessons Learned

Dropbox: AWS to Private Infrastructure (2016-2018)

37signals (Basecamp/Hey): AWS to Private Cloud (2022-2023)

Capital One: Data Centre to AWS (2012-2020)

Danish Government: Microsoft to Open Source (2017-Ongoing)

6. Hybrid Steady-State: Systems That May Never Fully Migrate

Candidates for Permanent Hybrid State

Managing Permanent Hybrid State

Hybrid Architecture Pattern

SOVEREIGN INFRASTRUCTURE

Core Systems

Citizen Data

Sensitive Processing

API Gateway (Kong/APISIX)

Secure Tunnel (WireGuard/IPsec)

US CLOUD (Residual)

Proxy/Cache

Legacy App A

SaaS Integration

ML Processing

7. Operational Considerations During Transition

Monitoring & Observability

Incident Response

Team Structure During Transition

Related Documentation