Back to All Concepts
MicroservicesInfrastructureNetworkingIntermediate

Service Discovery

Complete guide to microservice discovery in dynamic cloud environments, covering client-side vs server-side patterns, health checks, DNS-based discovery, and production implementations using Consul, etcd, Eureka, and Kubernetes services.

The Dynamic Infrastructure Problem

Old World (2005):
Web Server: 192.168.1.10 (hardcoded)
Database: 192.168.1.20 (never changes)config.yml: DB_HOST=192.168.1.20 ✓

New World (2025 - Kubernetes):
Containers start/stop every minute
IP addresses change constantly
Autoscaling adds/removes instances
config.yml: DB_HOST=??? ❌
Click to expand code...

Solution: Service Discovery - A dynamic phonebook for your infrastructure.


What is Service Discovery?

Service Discovery enables services to find and communicate with each other without hardcoded IPs, adapting automatically to infrastructure changes.

The Pattern

1. Service Registration:
   Service A starts → "I'm alive at 10.0.0.5:8080"

2. Health Checking:
   Registry pings Service A every 10s → Still alive?

3. Service Lookup:
   Service B needs Service A → Query registry

4. Connection:
   Service B connects to 10.0.0.5:8080
Click to expand code...

Two Patterns

1. Client-Side Discovery

Client queries registry and chooses instance.

mermaid
graph LR
    SA[Service A<br/>10.0.0.5] -.->|1. Register| R[Registry<br/>Consul/Eureka]
    SB[Service B] -->|2. Query| R
    R -->|3. IPs: [10.0.0.5,<br/>10.0.0.6]| SB
    SB -->|4. Connect| SA
Click to expand code...

Implementation (Netflix Eureka):

java
// Service A - Registration
@SpringBootApplication
@EnableEurekaClient
public class ServiceA {
    public static void main(String[] args) {
        SpringApplication.run(ServiceA.class, args);
    }
}

// application.yml
eureka:
  client:
    serviceUrl:
      defaultZone: http://eureka:8761/eureka/
  instance:
    preferIpAddress: true
    leaseRenewalIntervalInSeconds: 10  # Heartbeat

// Service B - Discovery
@RestController
public class ServiceB {
    @Autowired
    private DiscoveryClient discoveryClient;
    
    @Autowired
    private RestTemplate restTemplate;
    
    @GetMapping("/call-service-a")
    public String callServiceA() {
        // 1. Get instances from registry
        List<ServiceInstance> instances = 
            discoveryClient.getInstances("SERVICE-A");
        
        if (instances.isEmpty()) {
            return "No instances available";
        }
        
        // 2. Client-side load balancing
        ServiceInstance instance = instances.get(
            new Random().nextInt(instances.size())
        );
        
        // 3. Make request
        String url = String.format("http://%s:%d/api/data",
            instance.getHost(), instance.getPort());
            
        return restTemplate.getForObject(url, String.class);
    }
}
Click to expand code...

2. Server-Side Discovery

Load balancer queries registry and forwards traffic.

mermaid
graph LR
    SA[Service A<br/>10.0.0.5] -.->|1. Register| R[Registry<br/>etcd]
    SB[Service B] -->|2. Request| LB[Load Balancer]
    LB -->|3. Query| R
    R -->|4. IPs| LB
    LB -->|5. Forward| SA
Click to expand code...

Implementation (Kubernetes):

yaml
# Service A - Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-a
spec:
  replicas: 3
  selector:
    matchLabels:
      app: service-a
  template:
    metadata:
      labels:
        app: service-a
    spec:
      containers:
      - name: service-a
        image: service-a:1.0
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

---
# Service A - Service (Load Balancer)
apiVersion: v1
kind: Service
metadata:
  name: service-a
spec:
  selector:
    app: service-a
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

---
# Service B - Calls Service A
# Just use DNS: http://service-a/api/data
# Kubernetes DNS resolves to service VIP
# Kube-proxy load balances to pods
Click to expand code...

Health Checks

Problem: Instance is registered but crashed/unhealthy.

Solution: Continuous health monitoring.

Types of Health Checks

1. HTTP Health Check

javascript
// Express.js
const express = require('express');
const app = express();

// Liveness: Is process alive?
app.get('/health/alive', (req, res) => {
  res.status(200).send('OK');
});

// Readiness: Can it handle traffic?
app.get('/health/ready', async (req, res) => {
  try {
    // Check dependencies
    await db.ping();
    await redis.ping();
    
    res.status(200).send('Ready');
  } catch (error) {
    res.status(503).send('Not Ready');
  }
});

app.listen(8080);
Click to expand code...

2. TCP Health Check

python
# Consul config
{
  "service": {
    "name": "web",
    "port": 8080,
    "check": {
      "tcp": "localhost:8080",
      "interval": "10s",
      "timeout": "1s"
    }
  }
}
Click to expand code...

3. TTL Health Check

python
import consul
import time

c = consul.Consul()

# Register service
c.agent.service.register(
    'my-service',
    service_id='my-service-1',
    port=8080,
    check=consul.Check.ttl('15s')  # Must check in every 15s
)

# Heartbeat loop
while True:
    try:
        # Pass health check
        c.agent.check.ttl_pass('service:my-service-1')
        time.sleep(10)
    except Exception:
        c.agent.check.ttl_fail('service:my-service-1')
Click to expand code...

Service Registry Implementations

1. Consul (HashiCorp)

python
import consul
import requests

# Register service
c = consul.Consul(host='consul:8500')

c.agent.service.register(
    'user-service',
    service_id='user-service-1',
    address='10.0.0.5',
    port=8080,
    tags=['v1', 'production'],
    check=consul.Check.http('http://10.0.0.5:8080/health', interval='10s')
)

# Discover service
def get_service_address(service_name):
    _, services = c.health.service(service_name, passing=True)
    
    if not services:
        raise Exception(f"No healthy instances of {service_name}")
    
    # Client-side load balancing
    import random
    service = random.choice(services)
    
    return f"http://{service['Service']['Address']}:{service['Service']['Port']}"

# Use
url = get_service_address('user-service')
response = requests.get(f"{url}/api/users")
Click to expand code...

2. etcd (Kubernetes Backend)

go
package main

import (
    "context"
    "fmt"
    "go.etcd.io/etcd/client/v3"
    "time"
)

func registerService() {
    cli, _ := clientv3.New(clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 5 * time.Second,
    })
    defer cli.Close()

    // Register with TTL lease
    lease, _ := cli.Grant(context.TODO(), 10)
    
    _, _ = cli.Put(context.TODO(),
        "/services/user-service/10.0.0.5:8080",
        `{"address":"10.0.0.5","port":8080}`,
        clientv3.WithLease(lease.ID))
    
    // Keep-alive
    ch, _ := cli.KeepAlive(context.TODO(), lease.ID)
    
    for range ch {
        // Renewed
    }
}

func discoverService(serviceName string) []string {
    cli, _ := clientv3.New(clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 5 * time.Second,
    })
    defer cli.Close()

    resp, _ := cli.Get(context.TODO(),
        fmt.Sprintf("/services/%s/", serviceName),
        clientv3.WithPrefix())
    
    var endpoints []string
    for _, kv := range resp.Kvs {
        // Parse JSON value to get address
        endpoints = append(endpoints, string(kv.Value))
    }
    
    return endpoints
}
Click to expand code...

3. Eureka (Netflix OSS)

javascript
// Node.js Eureka client
const Eureka = require('eureka-js-client').Eureka;

const client = new Eureka({
  instance: {
    app: 'user-service',
    hostName: 'localhost',
    ipAddr: '10.0.0.5',
    port: {
      '$': 8080,
      '@enabled': true,
    },
    vipAddress: 'user-service',
    dataCenterInfo: {
      '@class': 'com.netflix.appinfo.InstanceInfo$DefaultDataCenterInfo',
      name: 'MyOwn',
    },
  },
  eureka: {
    host: 'eureka',
    port: 8761,
    servicePath: '/eureka/apps/',
  },
});

// Register
client.start();

// Discover
function getServiceInstances(appName) {
  const instances = client.getInstancesByAppId(appName);
  return instances.map(i => ({
    host: i.ipAddr,
    port: i.port.$
  }));
}
Click to expand code...

DNS-Based Discovery

Simplest approach: Use DNS SRV records.

# DNS SRV record
_http._tcp.user-service.local. 86400 IN SRV 0 5 8080 instance1.local.
_http._tcp.user-service.local. 86400 IN SRV 0 5 8080 instance2.local.
_http._tcp.user-service.local. 86400 IN SRV 0 5 8080 instance3.local.

Format: priority weight port host
Click to expand code...

Implementation:

python
import dns.resolver

def discover_service_dns(service_name):
    """Discover service via DNS SRV"""
    try:
        answers = dns.resolver.resolve(f'_http._tcp.{service_name}', 'SRV')
        
        instances = []
        for rdata in answers:
            instances.append({
                'host': str(rdata.target).rstrip('.'),
                'port': rdata.port,
                'priority': rdata.priority,
                'weight': rdata.weight
            })
        
        # Sort by priority, then weight
        instances.sort(key=lambda x: (x['priority'], -x['weight']))
        
        return instances
        
    except dns.resolver.NXDOMAIN:
        return []

# Usage
instances = discover_service_dns('user-service.local')
if instances:
    instance = instances[0]
    url = f"http://{instance['host']}:{instance['port']}"
Click to expand code...

Real-World Patterns

1. Netflix (Eureka + Ribbon)

Architecture:
- Eureka: Service registry
- Ribbon: Client-side load balancer
- Hystrix: Circuit breaker

Flow:
1. Services register with Eureka
2. Ribbon queries Eureka for service list
3. Ribbon balances requests across instances
4. Hystrix wraps calls for fault tolerance
Click to expand code...

2. Kubernetes (DNS + kube-proxy)

Architecture:
- CoreDNS: DNS-based service discovery
- kube-proxy: iptables load balancing
- Service: Stable VIP for pod set

Flow:
1. Pods labeled (app=user-service)
2. Service selects pods by label
3. DNS resolves service name to VIP
4. kube-proxy routes VIP to pod IPs
Click to expand code...

3. AWS (ELB + Route 53)

Architecture:
- ELB: Load balancer (server-side discovery)
- Route 53: DNS service
- Auto Scaling: Dynamic instance management

Flow:
1. Instances register with ELB
2. Route 53 points to ELB DNS
3. ELB health checks instances
4. Clients call ELB, never instances directly
Click to expand code...

Comparison

PatternProsConsBest For
Client-SideMore control, no SPOFComplex client, language-specificMicroservices mesh
Server-SideSimple client, language-agnosticLB is bottleneck/SPOFPolyglot systems
DNSUniversal, simpleCaching issues, no health checksLegacy integration

Interview Tips 💡

When discussing service discovery in interviews:

  1. Problem: "In cloud, IPs change constantly - can't hardcode addresses..."
  2. Two patterns: "Client-side (client picks instance) vs server-side (LB picks)..."
  3. Health checks: "Registry must know which instances are healthy..."
  4. Tools: "Consul for service mesh, etcd for Kubernetes, Eureka for Netflix stack..."
  5. K8s example: "Service creates stable VIP, DNS resolves to it, kube-proxy routes to pods..."
  6. Trade-offs: "Client-side gives control but complex; server-side simple but LB bottleneck..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles