Back to All Concepts
DevOpsDeploymentInfrastructureIntermediate

Blue-Green Deployment

Zero-downtime deployment strategy using two identical production environments (Blue and Green) to enable instant rollbacks, reduce risk, and allow thorough testing before directing traffic.

What is Blue-Green Deployment?

Blue-Green Deployment is a release strategy that reduces downtime and risk by running two identical production environments. At any time, one serves live traffic while the other is idle or being updated.

Simple analogy: Having two stages at a concert venue. While one stage performs, the other gets set up for the next act. Switch spotlights instantly.

The Problem: Risky Deployments

Traditional deployment:
1. Take site offline
2. Deploy new version
3. Test in production
4. If broken, scramble to fix
5. Site down for 10-60 minutes

Problem: Downtime + risk of broken production
Click to expand code...

Blue-Green solution:

1. Deploy to idle environment
2. Test thoroughly
3. Switch traffic instantly (0 downtime)
4. If broken, switch back instantly (0 downtime rollback)
Click to expand code...

How It Works

The Environments

Blue Environment (Active):
- Running version 1.0
- Serving 100% user traffic
- Stable, proven version

Green Environment (Idle):
- Updated to version 2.0
- No user traffic
- Ready for testing
Click to expand code...

Deployment Process

mermaid
graph TB
    Users[Users] -->|100% traffic| LB[Load Balancer]
    LB --> Blue[Blue Environment<br/>v1.0 ACTIVE]
    Green[Green Environment<br/>v2.0 IDLE] -.->|No traffic| LB
    
    Deploy[Deploy v2.0] --> Green
    Test[Test v2.0] --> Green
    Switch[Switch Traffic] --> LB
    LB2[Load Balancer] --> Green2[Green Environment<br/>v2.0 ACTIVE]
    Blue2[Blue Environment<br/>v1.0 IDLE] -.->|No traffic| LB2
Click to expand code...

Steps:

  1. Blue is active (serving production traffic with v1.0)
  2. Deploy to Green (install v2.0 on idle environment)
  3. Test Green (QA team validates v2.0 works)
  4. Switch load balancer (point traffic to Green)
  5. Monitor (watch metrics for errors)
  6. Keep Blue (Blue v1.0 becomes rollback environment)

Implementation Examples

AWS with ELB

yaml
# infrastructure.yml
Resources:
  BlueEnvironment:
    Type: AWS::ElasticBeanstalk::Environment
    Properties:
      ApplicationName: my-app
      EnvironmentName: my-app-blue
      VersionLabel: v1.0
      
  GreenEnvironment:
    Type: AWS::ElasticBeanstalk::Environment
    Properties:
      ApplicationName: my-app
      EnvironmentName: my-app-green
      VersionLabel: v2.0
      
  LoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Name: my-app-lb
      
  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Targets:
        - Id: !Ref BlueEnvironment  # Initially points to Blue
Click to expand code...

Deployment script:

bash
#!/bin/bash
# deploy.sh

# 1. Deploy to Green
aws elasticbeanstalk update-environment \
  --environment-name my-app-green \
  --version-label v2.0

# 2. Wait for deployment
aws elasticbeanstalk wait environment-updated \
  --environment-name my-app-green

# 3. Health check
HEALTH=$(aws elasticbeanstalk describe-environment-health \
  --environment-name my-app-green \
  --attribute-names Status | jq -r '.Status')

if [ "$HEALTH" != "Ok" ]; then
  echo "Green environment unhealthy. Aborting."
  exit 1
fi

# 4. Run smoke tests
npm run smoke-test --env=green

# 5. Switch traffic (swap environment URLs)
aws elasticbeanstalk swap-environment-cnames \
  --source-environment-name my-app-blue \
  --destination-environment-name my-app-green

echo "Traffic switched to Green (v2.0)"
echo "Blue (v1.0) available for rollback"
Click to expand code...

Kubernetes

yaml
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: app
        image: myapp:1.0
        ports:
        - containerPort: 8080

---
# green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:2.0
        ports:
        - containerPort: 8080

---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: app-service
spec:
  selector:
    app: myapp
    version: blue  # Initially routes to Blue
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
Click to expand code...

Switch traffic:

bash
# Switch from Blue to Green
kubectl patch service app-service -p '{"spec":{"selector":{"version":"green"}}}'

# Rollback to Blue if needed
kubectl patch service app-service -p '{"spec":{"selector":{"version":"blue"}}}'
Click to expand code...

Docker Compose + Nginx

yaml
# docker-compose.yml
version: '3'
services:
  app-blue:
    image: myapp:1.0
    container_name: app-blue
    ports:
      - "8001:8080"
      
  app-green:
    image: myapp:2.0
    container_name: app-green
    ports:
      - "8002:8080"
      
  nginx:
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
Click to expand code...
nginx
# nginx.conf (initially points to Blue)
http {
  upstream backend {
    server app-blue:8080;
    # server app-green:8080;  # Commented out
  }
  
  server {
    listen 80;
    location / {
      proxy_pass http://backend;
    }
  }
}
Click to expand code...

Switch script:

bash
#!/bin/bash
# switch.sh

ACTIVE=$1  # "blue" or "green"

cat > nginx.conf <<EOF
http {
  upstream backend {
    server app-${ACTIVE}:8080;
  }
  
  server {
    listen 80;
    location / {
      proxy_pass http://backend;
    }
  }
}
EOF

docker exec nginx nginx -s reload
echo "Switched to ${ACTIVE}"
Click to expand code...

Advanced Patterns

1. Canary Deployment

Gradually shift traffic instead of instant switch.

yaml
# Kubernetes with Istio
apiVersion: networking.istio.io/v1alpha3
kind:VirtualService
metadata:
  name: app
spec:
  hosts:
  - app
  http:
  - match:
    - headers:
        user-type:
          exact: beta
    route:
    - destination:
        host: app
        subset: green
      weight: 100
  - route:
    - destination:
        host: app
        subset: blue
      weight: 90
    - destination:
        host: app
        subset: green
      weight: 10  # 10% of production traffic to Green
Click to expand code...

Gradual rollout:

Step 1: 1% to Green  (test with minimal risk)
Step 2: 10% to Green (validate performance)
Step 3: 50% to Green (half and half)
Step 4: 100% to Green (full cutover)
Click to expand code...

2. Rolling Deployment

Update servers one at a time (Kubernetes default).

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Max 1 extra pod during update
      maxUnavailable: 1   # Max 1 pod down during update
  template:
    spec:
      containers:
      - name: app
        image: myapp:2.0
Click to expand code...

Process:

10 pods running v1.0
1. Create 1 new pod (v2.0)  → 11 pods total
2. Terminate 1 old pod → 10 pods (1 v2.0, 9 v1.0)
3. Repeat until all v2.0
Click to expand code...

3. Feature Flags (Dark Launch)

Deploy code to production but keep features disabled.

javascript
// Feature flag backend
const featureFlags = {
  newCheckout: {
    enabled: false,
    rollout: 0  // 0% of users
  }
};

// Application code
app.get('/checkout', (req, res) => {
  const flag = featureFlags.newCheckout;
  
  if (flag.enabled && Math.random() * 100 < flag.rollout) {
    // New checkout (v2.0 code)
    return res.render('checkout-v2');
  }
  
  // Old checkout (v1.0 code)
  return res.render('checkout-v1');
});
Click to expand code...

Gradual activation:

javascript
// Start with 0%
rollout: 0

// Internal testing
rollout: 1  // 1% of users

// Beta users
rollout: 10

// Full release
rollout: 100
Click to expand code...

Comparison: Deployment Strategies

StrategyDowntimeRollback SpeedResource CostComplexity
Blue-GreenNoneInstantHigh (2x servers)Low
CanaryNoneFastMediumMedium
RollingNoneSlowLowLow
RecreateHighSlowLowVery Low
Feature FlagsNoneInstantLowHigh

Pros & Cons

Blue-Green Deployment

Pros:

  • Zero downtime
  • Instant rollback (just switch back)
  • Full environment testing before switch
  • Simple to understand

Cons:

  • Requires 2x infrastructure (expensive)
  • Database migrations tricky (must be backward compatible)
  • Stateful applications need session management

When to Use

Good for:

  • Critical applications (banking, healthcare)
  • When rollback speed is priority
  • When you can afford 2x infrastructure

Bad for:

  • Stateful apps with complex state
  • Very large deployments (cost prohibitive)
  • Databases with breaking schema changes

Database Challenges

Problem: Schema Changes

Blue (v1.0): Expects column "name"
Green (v2.0): Expects column "full_name"

Can't switch instantly - data incompatible!
Click to expand code...

Solution: Backward-Compatible Migrations

Phase 1: Add new column

sql
-- Deploy to Green, keep Blue running
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

-- App writes to both columns
UPDATE users SET full_name = name WHERE full_name IS NULL;
Click to expand code...

Phase 2: Switch to Green

javascript
// v2.0 code works with both columns
const name = user.full_name || user.name;
Click to expand code...

Phase 3: Remove old column (later)

sql
-- After Blue decommissioned
ALTER TABLE users DROP COLUMN name;
Click to expand code...

Monitoring & Validation

Pre-Switch Checklist

bash
#!/bin/bash
# validate-green.sh

echo "Running pre-switch validation..."

# 1. Health check
curl -f https://green.example.com/health || exit 1

# 2. Database connectivity
curl -f https://green.example.com/db-health || exit 1

# 3. Smoke tests
npm run smoke-test --env=green || exit 1

# 4. Load test
artillery run load-test.yml --environment green || exit 1

# 5. Manual approval
read -p "All checks passed. Switch to Green? (y/n) " -n 1 -r
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
  exit 1
fi

echo "Switching to Green..."
Click to expand code...

Post-Switch Monitoring

javascript
// Monitor key metrics
const metrics = {
  errorRate: monitor('error_rate'),
  responseTime: monitor('response_time_p99'),
  requestCount: monitor('requests_per_second')
};

// Auto-rollback if metrics degrade
if (metrics.errorRate > 1%) {
  alert('High error rate detected!');
  rollback();
}

if (metrics.responseTime > 1000ms) {
  alert('High latency detected!');
  rollback();
}
Click to expand code...

Real-World Examples

Netflix

Netflix uses Canary deployment (gradual Blue-Green):
1. Deploy to 1 server
2. Monitor for 10 minutes
3. If good, deploy to 10% of fleet
4. If good, deploy to 50%
5. If good, deploy to 100%

Auto-rollback if any region shows increased errors
Click to expand code...

Amazon

Amazon uses Blue-Green for critical services:
- 2 identical ASGs (Auto Scaling Groups)
- Route53 weighted routing
- Gradual traffic shift: 5% → 25% → 50% → 100%
- Keep Blue running for 24 hours before decommissioning
Click to expand code...

Interview Tips 💡

When discussing deployment strategies in system design interviews:

  1. Explain zero downtime: "Blue-Green allows instant switching with zero downtime..."
  2. Discuss rollback: "If issues detected, we instantly switch back to Blue..."
  3. Mention cost: "Requires 2x infrastructure, but worth it for critical systems..."
  4. Database migrations: "Schema changes must be backward compatible using expand-contract pattern..."
  5. Alternatives: "For cost savings, we could use rolling deployment or canary..."
  6. Real examples: "Netflix uses canary deployment for gradual rollout..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles