The Dynamic Infrastructure Problem
Old World (2005): Web Server: 192.168.1.10 (hardcoded) Database: 192.168.1.20 (never changes)config.yml: DB_HOST=192.168.1.20 ✓ New World (2025 - Kubernetes): Containers start/stop every minute IP addresses change constantly Autoscaling adds/removes instances config.yml: DB_HOST=??? ❌
Solution: Service Discovery - A dynamic phonebook for your infrastructure.
What is Service Discovery?
Service Discovery enables services to find and communicate with each other without hardcoded IPs, adapting automatically to infrastructure changes.
The Pattern
1. Service Registration: Service A starts → "I'm alive at 10.0.0.5:8080" 2. Health Checking: Registry pings Service A every 10s → Still alive? 3. Service Lookup: Service B needs Service A → Query registry 4. Connection: Service B connects to 10.0.0.5:8080
Two Patterns
1. Client-Side Discovery
Client queries registry and chooses instance.
graph LR
SA[Service A<br/>10.0.0.5] -.->|1. Register| R[Registry<br/>Consul/Eureka]
SB[Service B] -->|2. Query| R
R -->|3. IPs: [10.0.0.5,<br/>10.0.0.6]| SB
SB -->|4. Connect| SA
Implementation (Netflix Eureka):
// Service A - Registration
@SpringBootApplication
@EnableEurekaClient
public class ServiceA {
public static void main(String[] args) {
SpringApplication.run(ServiceA.class, args);
}
}
// application.yml
eureka:
client:
serviceUrl:
defaultZone: http://eureka:8761/eureka/
instance:
preferIpAddress: true
leaseRenewalIntervalInSeconds: 10 # Heartbeat
// Service B - Discovery
@RestController
public class ServiceB {
@Autowired
private DiscoveryClient discoveryClient;
@Autowired
private RestTemplate restTemplate;
@GetMapping("/call-service-a")
public String callServiceA() {
// 1. Get instances from registry
List<ServiceInstance> instances =
discoveryClient.getInstances("SERVICE-A");
if (instances.isEmpty()) {
return "No instances available";
}
// 2. Client-side load balancing
ServiceInstance instance = instances.get(
new Random().nextInt(instances.size())
);
// 3. Make request
String url = String.format("http://%s:%d/api/data",
instance.getHost(), instance.getPort());
return restTemplate.getForObject(url, String.class);
}
}
2. Server-Side Discovery
Load balancer queries registry and forwards traffic.
graph LR
SA[Service A<br/>10.0.0.5] -.->|1. Register| R[Registry<br/>etcd]
SB[Service B] -->|2. Request| LB[Load Balancer]
LB -->|3. Query| R
R -->|4. IPs| LB
LB -->|5. Forward| SA
Implementation (Kubernetes):
# Service A - Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: service-a
spec:
replicas: 3
selector:
matchLabels:
app: service-a
template:
metadata:
labels:
app: service-a
spec:
containers:
- name: service-a
image: service-a:1.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
# Service A - Service (Load Balancer)
apiVersion: v1
kind: Service
metadata:
name: service-a
spec:
selector:
app: service-a
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: ClusterIP
---
# Service B - Calls Service A
# Just use DNS: http://service-a/api/data
# Kubernetes DNS resolves to service VIP
# Kube-proxy load balances to pods
Health Checks
Problem: Instance is registered but crashed/unhealthy.
Solution: Continuous health monitoring.
Types of Health Checks
1. HTTP Health Check
// Express.js
const express = require('express');
const app = express();
// Liveness: Is process alive?
app.get('/health/alive', (req, res) => {
res.status(200).send('OK');
});
// Readiness: Can it handle traffic?
app.get('/health/ready', async (req, res) => {
try {
// Check dependencies
await db.ping();
await redis.ping();
res.status(200).send('Ready');
} catch (error) {
res.status(503).send('Not Ready');
}
});
app.listen(8080);
2. TCP Health Check
# Consul config
{
"service": {
"name": "web",
"port": 8080,
"check": {
"tcp": "localhost:8080",
"interval": "10s",
"timeout": "1s"
}
}
}
3. TTL Health Check
import consul
import time
c = consul.Consul()
# Register service
c.agent.service.register(
'my-service',
service_id='my-service-1',
port=8080,
check=consul.Check.ttl('15s') # Must check in every 15s
)
# Heartbeat loop
while True:
try:
# Pass health check
c.agent.check.ttl_pass('service:my-service-1')
time.sleep(10)
except Exception:
c.agent.check.ttl_fail('service:my-service-1')
Service Registry Implementations
1. Consul (HashiCorp)
import consul
import requests
# Register service
c = consul.Consul(host='consul:8500')
c.agent.service.register(
'user-service',
service_id='user-service-1',
address='10.0.0.5',
port=8080,
tags=['v1', 'production'],
check=consul.Check.http('http://10.0.0.5:8080/health', interval='10s')
)
# Discover service
def get_service_address(service_name):
_, services = c.health.service(service_name, passing=True)
if not services:
raise Exception(f"No healthy instances of {service_name}")
# Client-side load balancing
import random
service = random.choice(services)
return f"http://{service['Service']['Address']}:{service['Service']['Port']}"
# Use
url = get_service_address('user-service')
response = requests.get(f"{url}/api/users")
2. etcd (Kubernetes Backend)
package main
import (
"context"
"fmt"
"go.etcd.io/etcd/client/v3"
"time"
)
func registerService() {
cli, _ := clientv3.New(clientv3.Config{
Endpoints: []string{"localhost:2379"},
DialTimeout: 5 * time.Second,
})
defer cli.Close()
// Register with TTL lease
lease, _ := cli.Grant(context.TODO(), 10)
_, _ = cli.Put(context.TODO(),
"/services/user-service/10.0.0.5:8080",
`{"address":"10.0.0.5","port":8080}`,
clientv3.WithLease(lease.ID))
// Keep-alive
ch, _ := cli.KeepAlive(context.TODO(), lease.ID)
for range ch {
// Renewed
}
}
func discoverService(serviceName string) []string {
cli, _ := clientv3.New(clientv3.Config{
Endpoints: []string{"localhost:2379"},
DialTimeout: 5 * time.Second,
})
defer cli.Close()
resp, _ := cli.Get(context.TODO(),
fmt.Sprintf("/services/%s/", serviceName),
clientv3.WithPrefix())
var endpoints []string
for _, kv := range resp.Kvs {
// Parse JSON value to get address
endpoints = append(endpoints, string(kv.Value))
}
return endpoints
}
3. Eureka (Netflix OSS)
// Node.js Eureka client
const Eureka = require('eureka-js-client').Eureka;
const client = new Eureka({
instance: {
app: 'user-service',
hostName: 'localhost',
ipAddr: '10.0.0.5',
port: {
'$': 8080,
'@enabled': true,
},
vipAddress: 'user-service',
dataCenterInfo: {
'@class': 'com.netflix.appinfo.InstanceInfo$DefaultDataCenterInfo',
name: 'MyOwn',
},
},
eureka: {
host: 'eureka',
port: 8761,
servicePath: '/eureka/apps/',
},
});
// Register
client.start();
// Discover
function getServiceInstances(appName) {
const instances = client.getInstancesByAppId(appName);
return instances.map(i => ({
host: i.ipAddr,
port: i.port.$
}));
}
DNS-Based Discovery
Simplest approach: Use DNS SRV records.
# DNS SRV record _http._tcp.user-service.local. 86400 IN SRV 0 5 8080 instance1.local. _http._tcp.user-service.local. 86400 IN SRV 0 5 8080 instance2.local. _http._tcp.user-service.local. 86400 IN SRV 0 5 8080 instance3.local. Format: priority weight port host
Implementation:
import dns.resolver
def discover_service_dns(service_name):
"""Discover service via DNS SRV"""
try:
answers = dns.resolver.resolve(f'_http._tcp.{service_name}', 'SRV')
instances = []
for rdata in answers:
instances.append({
'host': str(rdata.target).rstrip('.'),
'port': rdata.port,
'priority': rdata.priority,
'weight': rdata.weight
})
# Sort by priority, then weight
instances.sort(key=lambda x: (x['priority'], -x['weight']))
return instances
except dns.resolver.NXDOMAIN:
return []
# Usage
instances = discover_service_dns('user-service.local')
if instances:
instance = instances[0]
url = f"http://{instance['host']}:{instance['port']}"
Real-World Patterns
1. Netflix (Eureka + Ribbon)
Architecture: - Eureka: Service registry - Ribbon: Client-side load balancer - Hystrix: Circuit breaker Flow: 1. Services register with Eureka 2. Ribbon queries Eureka for service list 3. Ribbon balances requests across instances 4. Hystrix wraps calls for fault tolerance
2. Kubernetes (DNS + kube-proxy)
Architecture: - CoreDNS: DNS-based service discovery - kube-proxy: iptables load balancing - Service: Stable VIP for pod set Flow: 1. Pods labeled (app=user-service) 2. Service selects pods by label 3. DNS resolves service name to VIP 4. kube-proxy routes VIP to pod IPs
3. AWS (ELB + Route 53)
Architecture: - ELB: Load balancer (server-side discovery) - Route 53: DNS service - Auto Scaling: Dynamic instance management Flow: 1. Instances register with ELB 2. Route 53 points to ELB DNS 3. ELB health checks instances 4. Clients call ELB, never instances directly
Comparison
| Pattern | Pros | Cons | Best For |
|---|---|---|---|
| Client-Side | More control, no SPOF | Complex client, language-specific | Microservices mesh |
| Server-Side | Simple client, language-agnostic | LB is bottleneck/SPOF | Polyglot systems |
| DNS | Universal, simple | Caching issues, no health checks | Legacy integration |
Interview Tips 💡
When discussing service discovery in interviews:
- Problem: "In cloud, IPs change constantly - can't hardcode addresses..."
- Two patterns: "Client-side (client picks instance) vs server-side (LB picks)..."
- Health checks: "Registry must know which instances are healthy..."
- Tools: "Consul for service mesh, etcd for Kubernetes, Eureka for Netflix stack..."
- K8s example: "Service creates stable VIP, DNS resolves to it, kube-proxy routes to pods..."
- Trade-offs: "Client-side gives control but complex; server-side simple but LB bottleneck..."
Related Concepts
- Load Balancing — Traffic distribution
- Health Checks — Service monitoring
- Service Mesh — Advanced service networking
- DNS — Name resolution
- Microservices — Architecture pattern
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
DNS Architecture
The phonebook of the internet. How Domain Name System works, the hierarchy of Route 53, and recursive vs iterative resolution strategies.
GraphQL vs REST vs gRPC
Comprehensive comparison of three major API paradigms: REST (resource-based), GraphQL (query-based), and gRPC (RPC-based), covering performance, use cases, and implementation trade-offs for modern distributed systems.
Proxies: Forward vs Reverse
Understanding proxy servers that act as intermediaries between clients and servers, including forward proxies for client anonymity and reverse proxies for load balancing, security, and caching.