What is Blue-Green Deployment?
Blue-Green Deployment is a release strategy that reduces downtime and risk by running two identical production environments. At any time, one serves live traffic while the other is idle or being updated.
Simple analogy: Having two stages at a concert venue. While one stage performs, the other gets set up for the next act. Switch spotlights instantly.
The Problem: Risky Deployments
Traditional deployment: 1. Take site offline 2. Deploy new version 3. Test in production 4. If broken, scramble to fix 5. Site down for 10-60 minutes Problem: Downtime + risk of broken production
Blue-Green solution:
1. Deploy to idle environment 2. Test thoroughly 3. Switch traffic instantly (0 downtime) 4. If broken, switch back instantly (0 downtime rollback)
How It Works
The Environments
Blue Environment (Active): - Running version 1.0 - Serving 100% user traffic - Stable, proven version Green Environment (Idle): - Updated to version 2.0 - No user traffic - Ready for testing
Deployment Process
graph TB
Users[Users] -->|100% traffic| LB[Load Balancer]
LB --> Blue[Blue Environment<br/>v1.0 ACTIVE]
Green[Green Environment<br/>v2.0 IDLE] -.->|No traffic| LB
Deploy[Deploy v2.0] --> Green
Test[Test v2.0] --> Green
Switch[Switch Traffic] --> LB
LB2[Load Balancer] --> Green2[Green Environment<br/>v2.0 ACTIVE]
Blue2[Blue Environment<br/>v1.0 IDLE] -.->|No traffic| LB2
Steps:
- Blue is active (serving production traffic with v1.0)
- Deploy to Green (install v2.0 on idle environment)
- Test Green (QA team validates v2.0 works)
- Switch load balancer (point traffic to Green)
- Monitor (watch metrics for errors)
- Keep Blue (Blue v1.0 becomes rollback environment)
Implementation Examples
AWS with ELB
# infrastructure.yml
Resources:
BlueEnvironment:
Type: AWS::ElasticBeanstalk::Environment
Properties:
ApplicationName: my-app
EnvironmentName: my-app-blue
VersionLabel: v1.0
GreenEnvironment:
Type: AWS::ElasticBeanstalk::Environment
Properties:
ApplicationName: my-app
EnvironmentName: my-app-green
VersionLabel: v2.0
LoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: my-app-lb
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Targets:
- Id: !Ref BlueEnvironment # Initially points to Blue
Deployment script:
#!/bin/bash # deploy.sh # 1. Deploy to Green aws elasticbeanstalk update-environment \ --environment-name my-app-green \ --version-label v2.0 # 2. Wait for deployment aws elasticbeanstalk wait environment-updated \ --environment-name my-app-green # 3. Health check HEALTH=$(aws elasticbeanstalk describe-environment-health \ --environment-name my-app-green \ --attribute-names Status | jq -r '.Status') if [ "$HEALTH" != "Ok" ]; then echo "Green environment unhealthy. Aborting." exit 1 fi # 4. Run smoke tests npm run smoke-test --env=green # 5. Switch traffic (swap environment URLs) aws elasticbeanstalk swap-environment-cnames \ --source-environment-name my-app-blue \ --destination-environment-name my-app-green echo "Traffic switched to Green (v2.0)" echo "Blue (v1.0) available for rollback"
Kubernetes
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: myapp:1.0
ports:
- containerPort: 8080
---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:2.0
ports:
- containerPort: 8080
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp
version: blue # Initially routes to Blue
ports:
- protocol: TCP
port: 80
targetPort: 8080
Switch traffic:
# Switch from Blue to Green
kubectl patch service app-service -p '{"spec":{"selector":{"version":"green"}}}'
# Rollback to Blue if needed
kubectl patch service app-service -p '{"spec":{"selector":{"version":"blue"}}}'
Docker Compose + Nginx
# docker-compose.yml
version: '3'
services:
app-blue:
image: myapp:1.0
container_name: app-blue
ports:
- "8001:8080"
app-green:
image: myapp:2.0
container_name: app-green
ports:
- "8002:8080"
nginx:
image: nginx:latest
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
# nginx.conf (initially points to Blue)
http {
upstream backend {
server app-blue:8080;
# server app-green:8080; # Commented out
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
}
Switch script:
#!/bin/bash
# switch.sh
ACTIVE=$1 # "blue" or "green"
cat > nginx.conf <<EOF
http {
upstream backend {
server app-${ACTIVE}:8080;
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
}
EOF
docker exec nginx nginx -s reload
echo "Switched to ${ACTIVE}"
Advanced Patterns
1. Canary Deployment
Gradually shift traffic instead of instant switch.
# Kubernetes with Istio
apiVersion: networking.istio.io/v1alpha3
kind:VirtualService
metadata:
name: app
spec:
hosts:
- app
http:
- match:
- headers:
user-type:
exact: beta
route:
- destination:
host: app
subset: green
weight: 100
- route:
- destination:
host: app
subset: blue
weight: 90
- destination:
host: app
subset: green
weight: 10 # 10% of production traffic to Green
Gradual rollout:
Step 1: 1% to Green (test with minimal risk) Step 2: 10% to Green (validate performance) Step 3: 50% to Green (half and half) Step 4: 100% to Green (full cutover)
2. Rolling Deployment
Update servers one at a time (Kubernetes default).
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Max 1 extra pod during update
maxUnavailable: 1 # Max 1 pod down during update
template:
spec:
containers:
- name: app
image: myapp:2.0
Process:
10 pods running v1.0 1. Create 1 new pod (v2.0) → 11 pods total 2. Terminate 1 old pod → 10 pods (1 v2.0, 9 v1.0) 3. Repeat until all v2.0
3. Feature Flags (Dark Launch)
Deploy code to production but keep features disabled.
// Feature flag backend
const featureFlags = {
newCheckout: {
enabled: false,
rollout: 0 // 0% of users
}
};
// Application code
app.get('/checkout', (req, res) => {
const flag = featureFlags.newCheckout;
if (flag.enabled && Math.random() * 100 < flag.rollout) {
// New checkout (v2.0 code)
return res.render('checkout-v2');
}
// Old checkout (v1.0 code)
return res.render('checkout-v1');
});
Gradual activation:
// Start with 0% rollout: 0 // Internal testing rollout: 1 // 1% of users // Beta users rollout: 10 // Full release rollout: 100
Comparison: Deployment Strategies
| Strategy | Downtime | Rollback Speed | Resource Cost | Complexity |
|---|---|---|---|---|
| Blue-Green | None | Instant | High (2x servers) | Low |
| Canary | None | Fast | Medium | Medium |
| Rolling | None | Slow | Low | Low |
| Recreate | High | Slow | Low | Very Low |
| Feature Flags | None | Instant | Low | High |
Pros & Cons
Blue-Green Deployment
✅ Pros:
- Zero downtime
- Instant rollback (just switch back)
- Full environment testing before switch
- Simple to understand
❌ Cons:
- Requires 2x infrastructure (expensive)
- Database migrations tricky (must be backward compatible)
- Stateful applications need session management
When to Use
Good for:
- Critical applications (banking, healthcare)
- When rollback speed is priority
- When you can afford 2x infrastructure
Bad for:
- Stateful apps with complex state
- Very large deployments (cost prohibitive)
- Databases with breaking schema changes
Database Challenges
Problem: Schema Changes
Blue (v1.0): Expects column "name" Green (v2.0): Expects column "full_name" Can't switch instantly - data incompatible!
Solution: Backward-Compatible Migrations
Phase 1: Add new column
-- Deploy to Green, keep Blue running ALTER TABLE users ADD COLUMN full_name VARCHAR(255); -- App writes to both columns UPDATE users SET full_name = name WHERE full_name IS NULL;
Phase 2: Switch to Green
// v2.0 code works with both columns const name = user.full_name || user.name;
Phase 3: Remove old column (later)
-- After Blue decommissioned ALTER TABLE users DROP COLUMN name;
Monitoring & Validation
Pre-Switch Checklist
#!/bin/bash # validate-green.sh echo "Running pre-switch validation..." # 1. Health check curl -f https://green.example.com/health || exit 1 # 2. Database connectivity curl -f https://green.example.com/db-health || exit 1 # 3. Smoke tests npm run smoke-test --env=green || exit 1 # 4. Load test artillery run load-test.yml --environment green || exit 1 # 5. Manual approval read -p "All checks passed. Switch to Green? (y/n) " -n 1 -r if [[ ! $REPLY =~ ^[Yy]$ ]]; then exit 1 fi echo "Switching to Green..."
Post-Switch Monitoring
// Monitor key metrics
const metrics = {
errorRate: monitor('error_rate'),
responseTime: monitor('response_time_p99'),
requestCount: monitor('requests_per_second')
};
// Auto-rollback if metrics degrade
if (metrics.errorRate > 1%) {
alert('High error rate detected!');
rollback();
}
if (metrics.responseTime > 1000ms) {
alert('High latency detected!');
rollback();
}
Real-World Examples
Netflix
Netflix uses Canary deployment (gradual Blue-Green): 1. Deploy to 1 server 2. Monitor for 10 minutes 3. If good, deploy to 10% of fleet 4. If good, deploy to 50% 5. If good, deploy to 100% Auto-rollback if any region shows increased errors
Amazon
Amazon uses Blue-Green for critical services: - 2 identical ASGs (Auto Scaling Groups) - Route53 weighted routing - Gradual traffic shift: 5% → 25% → 50% → 100% - Keep Blue running for 24 hours before decommissioning
Interview Tips 💡
When discussing deployment strategies in system design interviews:
- Explain zero downtime: "Blue-Green allows instant switching with zero downtime..."
- Discuss rollback: "If issues detected, we instantly switch back to Blue..."
- Mention cost: "Requires 2x infrastructure, but worth it for critical systems..."
- Database migrations: "Schema changes must be backward compatible using expand-contract pattern..."
- Alternatives: "For cost savings, we could use rolling deployment or canary..."
- Real examples: "Netflix uses canary deployment for gradual rollout..."
Related Concepts
- Load Balancing — Traffic switching mechanism
- Health Checks — Validating environment readiness
- Database Migrations — Backward-compatible schema changes
- CI/CD Pipelines — Automated deployment workflows
- Monitoring & Alerting — Detecting deployment issues
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
CI/CD Pipeline Architecture
Designing robust Continuous Integration and Continuous Deployment pipelines. Strategies for artifact promotion, testing pyramids, canary deployments, and rollback mechanisms.
Kubernetes Architecture Explained
Under the hood of K8s: The Control Plane (API Server, Scheduler, Etcd, Controllers) and Data Plane (Kubelet, Kube-proxy, Container Runtime).
Docker Internals
What actually is a container? Just a Linux process with a mask on. Deep dive into Namespaces, Cgroups, and Union Filesystems (OverlayFS).