Technology Validation

Evidence of technology readiness for CloudStack, open-source FaaS, and supporting components. Includes existing deployments, performance benchmarks, and go/no-go criteria.

Technical Expert Challenge

"You're betting €25B on technology that has never been proven at government scale. CloudStack has 200+ deployments but NO government deployment at AWS/Azure scale. OpenFaaS/Knative cold starts are 5-10x worse than Lambda. What's your rollback plan?"


1. Apache CloudStack - Technology Maturity Assessment

Platform Overview

Attribute Value Assessment
Project maturity Apache Top-Level Project since 2013 (12+ years) MATURE
Known deployments 150+ organisations (production) PROVEN
Scale capability Manages tens of thousands of physical servers ENTERPRISE-READY
Government deployments Limited public documentation; some known but not at AWS scale REQUIRES VALIDATION
Security certifications No FedRAMP-equivalent; ISO 27001 achievable GAP TO ADDRESS
Hypervisor support KVM, VMware, XenServer, Hyper-V FLEXIBLE

Enterprise Case Studies

AT&T (USA)
Scale: Enterprise-wide IaaS platform
Use case: Multi-tenant infrastructure for application teams; VMware and KVM workloads
Key features used: OpenTofu provider, GitOps management, over-provisioning
Source: CloudStack Collaboration Conference 2023
IKOULA (France)
Scale: Public cloud provider
Use case: Web hosting, dedicated servers, cloud computing
Key features used: CloudStack + XCP-ng (open-source hypervisor)
Source: ShapeBlue Case Studies
Telia Latvia (Baltics)
Scale: Regional telecoms provider
Use case: Cloud and CDN services from Baltic datacenters
Key features used: Modern datacenter deployment
Source: Apache CloudStack Wiki

CloudStack Gaps and Mitigations

Gap Risk Level Mitigation
No FedRAMP-equivalent certification MEDIUM Commission UK NCSC assessment; work towards ISO 27001 + government-specific controls
Limited government-scale deployments documented MEDIUM Pilot programme specifically validates government scale; AT&T proves enterprise scale
Smaller vendor ecosystem than VMware/AWS MEDIUM ShapeBlue provides enterprise support; consider consortium support model
Some advanced features (AI acceleration) less mature LOW AI/ML workloads not initial priority; can develop as needed

2. Serverless (FaaS) - Performance Validation

Cold Start Performance Comparison

AWS Lambda
~50-100ms
OpenFaaS (warm)
~50-150ms
OpenFaaS (cold)
1-2 seconds
Knative (cold)
4-6 seconds
Fission (pooled)
100-200ms

Platform Comparison

Platform Cold Start Warm Performance Scalability Assessment
AWS Lambda 50-100ms Excellent Excellent (managed) BENCHMARK
OpenFaaS 1-2s (cold); opt-in Good Good (gradual) ACCEPTABLE
Knative 4-6s Good Excellent (burst) BURST WORKLOADS
Fission 100-200ms (pooled) Good Good BEST COLD START

Cold Start Mitigation Strategies

  • OpenFaaS default: Scale-to-zero disabled by default, eliminating cold starts for latency-sensitive functions
  • Container pooling: Pre-warm container pools (Fission approach) reduces cold starts to milliseconds
  • Predictive scaling: ML models forecast traffic and pre-warm containers (85% tail latency reduction)
  • Hybrid approach: Use OpenFaaS for latency-sensitive, Knative for batch/async workloads

Honest Assessment

Open-source FaaS cold starts are 10-50x slower than AWS Lambda worst-case. However:

  • Cold starts can be avoided by keeping functions warm (trade-off: higher base cost)
  • Many government workloads are async/batch where cold start is acceptable
  • Fission with pooling approaches Lambda-like performance
  • Not all workloads require sub-100ms response times

Recommendation: Accept higher latency for initial release; invest in optimisation R&D as part of capability development (Workstream A).

Cold Start User Impact Quantification

Which services would be degraded?

Service Type Current Latency Requirement Cold Start Impact User Experience Effect
API Gateways / Authentication <100ms expected HIGH - 2s cold start noticeable First login of day may feel slow. Mitigation: keep auth functions warm (£50K/year).
Form Submission Processing <500ms typical MEDIUM - 2s acceptable Users expect brief wait on form submit. 2s within tolerance.
Document Generation 1-5s typical LOW - cold start hidden Users already expect delay. No perceptible impact.
Batch/Async Processing Minutes to hours NONE No user-facing impact. Ideal for open-source FaaS.
Real-time Chat/Notifications <200ms required HIGH - unacceptable Must use persistent containers, not FaaS. ~15% of serverless workloads.
Static Content / CDN <50ms expected N/A - not FaaS Object storage (MinIO) handles this. No FaaS involvement.

Summary: Affected Services

  • ~15% of serverless workloads require <200ms and cannot tolerate cold starts → use persistent containers
  • ~25% of serverless workloads would benefit from warm functions → budget £100-200K/year for always-on
  • ~60% of serverless workloads are async/batch/document-gen → cold starts acceptable

Citizen experience: For the average user on GOV.UK completing a transaction:

  • Page loads: No impact (static content via CDN)
  • Form submission: Possible 1-2s additional wait on first submission of session (acceptable)
  • PDF generation: No impact (already async)
  • Login: Possible 1-2s on first login; subsequent logins cached (mitigation: warm auth functions)

Net assessment: Minimal user-facing impact for most services. Budget £200K/year for warm functions on latency-sensitive paths. Real-time services (~15%) must use containers, not FaaS.

3. Supporting Technology Stack - Maturity Assessment

Component Open Source Option Production Readiness Government Use
Object Storage (S3) MinIO MATURE (S3-compatible, enterprise support) Yes - multiple government deployments
Relational Database PostgreSQL MATURE (30+ years, enterprise standard) Yes - GDS standard, NHS uses
Message Queue Apache Kafka / RabbitMQ MATURE Yes - widely deployed
Identity/IAM Keycloak MATURE (Red Hat supported) Yes - UK government use cases
Secrets Management OpenBao MATURE Yes - common in secure deployments
Container Orchestration Kubernetes MATURE (CNCF graduated) Yes - standard across governments
Service Mesh Istio / Linkerd MATURE Yes - enterprise adoption
Monitoring Prometheus + Grafana MATURE (CNCF graduated) Yes - GDS standard monitoring
Caching Redis / Valkey MATURE Yes - standard component
Data Warehouse Apache Spark / Trino MATURE but complex Some - requires specialist skills
ML/AI Platform Kubeflow / MLflow MATURING Limited - emerging use cases

4. Technology Go/No-Go Gates

The following gates must be passed before committing to each technology component:

Gate 1: Security Assessment (Month 6)

Criterion Required Evidence Assessor
CloudStack security architecture review Independent pen test + architecture review NCSC / accredited assessor
Open source component security SBOM for all components; CVE response process Security team
HSM integration validation Successful HSM integration for key management Crypto specialists
Network isolation verification Multi-tenant isolation demonstrated Network security

Gate 2: Performance Validation (Month 12)

Criterion Required Evidence Threshold
Compute performance Benchmark suite (SPECint, etc.) Within 10% of equivalent AWS EC2
Storage performance MinIO IOPS/throughput testing Within 20% of S3 equivalent
Network performance Inter-zone latency, throughput <1ms same-zone; <5ms cross-zone
FaaS latency Cold/warm start measurements Warm: <200ms P95; Cold: acceptable for use case

Gate 3: Scalability Validation (Month 18)

Criterion Required Evidence Threshold
Horizontal scaling Scale from 100 to 1,000 VMs in controlled test Linear scaling demonstrated
Management plane resilience CloudStack management failover test RTO <15 minutes
Multi-zone operation 3-zone deployment operational Cross-zone DR functional
10x load stress test Pilot workloads at 10x expected scale No degradation beyond 20%

5. Fallback and Rollback Options

If CloudStack or open-source components fail validation, the following fallbacks exist:

Scenario Fallback Option Trade-off
CloudStack fails security assessment Option 1: OpenStack (more complex but larger community)
Option 2: Commercial sovereign cloud (OVHcloud, Scaleway)
Option 3: OCI Dedicated Region (Oracle, but on-prem/dedicated)
OpenStack: Higher complexity, more skills needed
Commercial: Less open source, vendor dependency
OCI: US company, but fully dedicated
FaaS performance unacceptable Option 1: Accept higher latency for non-critical functions
Option 2: Keep functions warm (higher cost)
Option 3: Use container-based approach instead of FaaS
Option 1: UX impact on some services
Option 2: 2-3x compute cost increase
Option 3: Developer experience changes
Integration failures between components Option 1: Extend pilot timeline
Option 2: Use managed services from European providers
Option 3: Reduce scope to proven components only
Option 1: Cost increase, delay
Option 2: Less sovereignty, vendor dependency
Option 3: Reduced capability set
Complete technology approach fails Return to status quo (AWS/Azure/GCP) with enhanced contractual protections and data localisation requirements Sovereignty risk remains; pilot costs sunk

OCI Dedicated Region as Parallel Path

Consideration: Oracle Cloud Infrastructure (OCI) offers "Dedicated Region" deployment where Oracle hardware is deployed in customer/government premises, managed by Oracle but fully isolated. This provides:

  • Full AWS/Azure feature parity
  • Data remains on-premise under government control
  • Trade-off: Still US company; operational dependency on Oracle

Recommendation: Include OCI Dedicated Region as a benchmark/comparison during pilot. If open-source approach fails, this provides a fallback that's more sovereign than public cloud but less than full open-source independence.


Summary Assessment

Component Readiness Key Risk Recommendation
CloudStack IaaS AMBER Government-scale unproven Validate in pilot; enterprise scale proven
Open FaaS AMBER Cold start performance Accept trade-off or use pooling; consider Fission
Data services (MinIO, PG, etc.) GREEN Low - mature technologies Proceed with confidence
Identity (Keycloak) GREEN Low - proven in government Proceed with confidence
Container platform (K8s) GREEN Low - industry standard Proceed with confidence

Document Status

This technology validation provides honest assessment including gaps and risks. Key message: core technologies are mature; CloudStack and FaaS require validation at government scale through the pilot programme.

Version: 1.0 | Last updated: January 2026

Back to main documentation