Generating Unique IDs at Scale

In a single-server setup, AUTO_INCREMENT is perfect. In a distributed system with 100 database shards, it fails immediately.

The Challenge: How do you generate globally unique, sortable, compact IDs across hundreds of servers without a centralized coordinator?

Requirements

A good distributed ID system must satisfy:

Globally Unique: No two objects can ever share an ID (Collisions = Data Corruption).
Sortable: IDs should be time-ordered so ID1 < ID2 if Event1 happened before Event2.
Compact: 64-bit integers are better for database indexing than 128-bit strings.
High Availability: ID generation cannot be a single point of failure (SPOF).
High Performance: Must generate millions of IDs per second across the cluster.

Approach 1: UUID (Universally Unique Identifier)

A 128-bit number usually displayed as hex string: 123e4567-e89b-12d3-a456-426614174000.

UUID v4 (Random)

Completely random bits based on cryptographic randomness.

python

import uuid

id = uuid.uuid4()
print(id)  # e.g., '3c9f1b7a-4f3e-4e1a-9c2b-8d7e6f5a4b3c'

Click to expand code...

Pros:

Can be generated offline by the client
Zero coordination needed
Extremely low collision probability (~0% for practical purposes)

Cons:

Not sortable: Random ordering
Database unfriendly: Indexing random UUIDs causes B-Tree page fragmentation
Large size: 128 bits = 16 bytes (vs 8 bytes for int64)

Performance Impact:

sql

-- Sequential IDs: New rows go to the end (fast)
INSERT INTO posts (id, ...) VALUES (1001, ...), (1002, ...);

-- Random UUIDs: New rows scattered everywhere (slow, fragmentation)
INSERT INTO posts (id, ...) VALUES (uuid(), ...), (uuid(), ...);

Click to expand code...

UUID v7 (Time-Ordered) ⭐

Modern UUID standard (RFC 9562) that embeds timestamp for sortability.

Structure:

48 bits: Unix timestamp in milliseconds
12 bits: Randomized sequence
62 bits: Random data
6 bits: Version and variant markers

javascript

// JavaScript implementation
function uuidv7() {
  const timestamp = Date.now();
  const timestampHex = timestamp.toString(16).padStart(12, '0');
  const randomHex = Math.random().toString(16).substr(2, 16);
  
  return `${timestampHex.substr(0, 8)}-${timestampHex.substr(8, 4)}-7${randomHex.substr(0, 3)}-${randomHex.substr(3, 4)}-${randomHex.substr(7)}`;
}

console.log(uuidv7());  // '018d3e2a-1b2c-7d4e-a5f6-b7c8d9e0f1a2'

Click to expand code...

Pros:

Sortable! Preserves insertion order
Database friendly (sequential writes)
No coordination needed

Cons:

Still 128-bit (2x larger than int64)
Requires storing as string or binary in most databases

Use Case: Ideal for databases that don't support custom ID generation but need sortability.

Approach 2: Central Ticket Server (Flickr Strategy)

Use a centralized database just to hand out IDs.

MySQL Implementation:

sql

CREATE TABLE Tickets64 (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  stub CHAR(1) NOT NULL DEFAULT ''
);

-- Get next ID
REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

Click to expand code...

Python Client:

python

def get_next_id():
    cursor.execute("REPLACE INTO Tickets64 (stub) VALUES ('a')")
    cursor.execute("SELECT LAST_INSERT_ID()")
    return cursor.fetchone()[0]

Click to expand code...

Pros:

Simple to implement
Guarantees sequential numbers
Works with existing infrastructure

Cons:

Single Point of Failure: If this DB dies, you can't create anything
High latency: Network roundtrip for every ID (slow!)
Limited throughput: ~10k IDs/sec max (database bottleneck)

Multi-Master Variant:

sql

-- Server 1: IDs 1, 3, 5, 7, 9...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 1;

-- Server 2: IDs 2, 4, 6, 8, 10...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 2;

Click to expand code...

Reduces SPOF but IDs are no longer sequential.

Approach 3: Twitter Snowflake ⭐⭐⭐ (Recommended)

Snowflake ID Generator

Timestamp (Epoch) (41 bits)

1779023909678

...

Machine ID (10 bits)

0101010101

Sequence (12 bits)

0000

17790239096780000

64-bit INT

Recently Generated

Click Generate...

A 64-bit integer composed of bit-packed fields. This is the industry standard.

Bit Structure

Component	Bits	Range	Description
Sign Bit	1	Always 0	Keeps ID positive
Timestamp	41	0-2^41	Milliseconds since custom epoch (69 years)
Datacenter ID	5	0-31	Identifies physical datacenter
Machine ID	5	0-31	Identifies server within datacenter
Sequence	12	0-4095	Incrementing counter per millisecond

Total: 64 bits (fits in a BIGINT)

Capacity:

Per Machine: 4,096 IDs per millisecond = 4,096,000 IDs/sec
Per Datacenter: 32 machines × 4M = 131M IDs/sec
Total Cluster: 32 datacenters × 131M = 4.2 billion IDs/sec

Complete Python Implementation

python

import time
import threading

class SnowflakeIDGenerator:
    def __init__(self, datacenter_id, machine_id, custom_epoch=1609459200000):
        # Validation
        if datacenter_id < 0 or datacenter_id > 31:
            raise ValueError("Datacenter ID must be 0-31")
        if machine_id < 0 or machine_id > 31:
            raise ValueError("Machine ID must be 0-31")
        
        self.datacenter_id = datacenter_id
        self.machine_id = machine_id
        self.custom_epoch = custom_epoch  # Jan 1, 2021
        
        self.sequence = 0
        self.last_timestamp = -1
        self.lock = threading.Lock()
    
    def _current_millis(self):
        return int(time.time() * 1000)
    
    def _wait_next_millis(self, last_timestamp):
        timestamp = self._current_millis()
        while timestamp <= last_timestamp:
            timestamp = self._current_millis()
        return timestamp
    
    def generate_id(self):
        with self.lock:
            timestamp = self._current_millis()
            
            # Clock moved backwards!
            if timestamp < self.last_timestamp:
                raise Exception(f"Clock moved backwards. Refusing to generate ID for {self.last_timestamp - timestamp}ms")
            
            # Same millisecond - increment sequence
            if timestamp == self.last_timestamp:
                self.sequence = (self.sequence + 1) & 0xFFF  # 12-bit mask
                
                # Sequence overflow - wait for next millisecond
                if self.sequence == 0:
                    timestamp = self._wait_next_millis(self.last_timestamp)
            else:
                # New millisecond - reset sequence
                self.sequence = 0
            
            self.last_timestamp = timestamp
            
            # Build the ID
            id = ((timestamp - self.custom_epoch) << 22) | \
                 (self.datacenter_id << 17) | \
                 (self.machine_id << 12) | \
                 self.sequence
            
            return id

# Usage
generator = SnowflakeIDGenerator(datacenter_id=1, machine_id=3)

for _ in range(5):
    print(generator.generate_id())
    
# Output (example):
# 7045391791689121792
# 7045391791689121793
# 7045391791689121794
# 7045391791689121795
# 7045391791689121796

Click to expand code...

Decoding Snowflake IDs

python

def decode_snowflake(id, custom_epoch=1609459200000):
    timestamp = ((id >> 22) + custom_epoch) / 1000
    datacenter = (id >> 17) & 0x1F  # 5-bit mask
    machine = (id >> 12) & 0x1F
    sequence = id & 0xFFF
    
    import datetime
    dt = datetime.datetime.fromtimestamp(timestamp)
    
    return {
        'timestamp': dt.isoformat(),
        'datacenter_id': datacenter,
        'machine_id': machine,
        'sequence': sequence
    }

print(decode_snowflake(7045391791689121795))
# {
#   'timestamp': '2024-01-15T10:23:45.123',
#   'datacenter_id': 1,
#   'machine_id': 3,
#   'sequence': 3
# }

Click to expand code...

The Clock Synchronization Problem

Snowflake relies on wall-clock time. What if the server's clock moves backward (NTP sync, leap seconds)?

Problem

10:00:00.100 → Generate ID: 123456
10:00:00.095 ← Clock goes backwards!
10:00:00.095 → Generate ID: 123455 (COLLISION!)

Click to expand code...

Solutions

Solution 1: Refuse to Generate (Strict)

python

if timestamp < self.last_timestamp:
    raise Exception(f"Clock moved backwards. Refusing to generate ID")

Click to expand code...

Pros: Guarantees uniqueness
Cons: Service downtime during clock skew

Solution 2: Wait Until Clock Catches Up

python

if timestamp < self.last_timestamp:
    # Wait up to 5ms for clock to catch up
    for _ in range(5):
        time.sleep(0.001)
        timestamp = self._current_millis()
        if timestamp >= self.last_timestamp:
            break
    else:
        raise Exception("Clock skew too large")

Click to expand code...

Solution 3: Use Sequence Bits as Buffer

python

if timestamp < self.last_timestamp:
    # Use extra sequence bits for small skews (< 1ms)
    if self.last_timestamp - timestamp < 10:
        timestamp = self.last_timestamp
        # Continue with sequence increment
    else:
        raise Exception("Clock skew too large")

Click to expand code...

Clock Synchronization Best Practices

Use NTP or PTP: Keep clocks synchronized within 100ms
Monitor clock drift: Alert if drift > 50ms
Gradual adjustments: Use ntpd instead of ntpdate (gradual vs sudden)
Disable leap second smearing: Or handle explicitly

Production Examples

Twitter Snowflake

Original implementation: Powers tweet IDs, user IDs, DM IDs
Capacity: 10,000+ IDs per second per machine
Scale: Billions of tweets since 2010

Example Tweet ID:

1749823749823700000

Click to expand code...

This encodes:

When it was tweeted
Which datacenter
Which server
Sequence within that millisecond

Instagram Snowflake

Instagram uses a modified Snowflake with:

41 bits: timestamp
13 bits: shard ID (8,192 database shards)
10 bits: sequence (1,024 per ms per shard)

Discord Snowflake

Discord uses Snowflake for message IDs, channel IDs, user IDs:

javascript

// Extract timestamp from Discord message ID
const timestamp = (messageId >> 22) + 1420070400000;
const date = new Date(timestamp);

Click to expand code...

MongoDB ObjectId

Similar concept but 96-bit:

4 bytes: Unix timestamp
5 bytes: Random value
3 bytes: Incrementing counter

javascript

const ObjectId = require('mongodb').ObjectId;
const id = new ObjectId();
console.log(id.getTimestamp());  // Embedded timestamp!

Click to expand code...

Comparison Table

Method	Size	Sortable?	Coordination?	IDs/sec	Use Case
UUID v4	128-bit	❌ No	None	Unlimited	Request IDs, temp keys
UUID v7	128-bit	✅ Yes	None	Unlimited	General purpose, no custom infra
Ticket Server	64-bit	✅ Yes	High (Network)	~10K	Legacy systems
Snowflake	64-bit	✅ Yes	None (Internal)	Millions	Primary keys at scale

Interview Tips 💡

When discussing ID generation in interviews:

Start with requirements: "Are IDs time-sortable? What's the expected QPS?"
Compare approaches: "UUID v7 is simple but 128-bit. Snowflake is 64-bit and sortable but needs clock sync."
Explain Snowflake: "41-bit timestamp, 10-bit worker ID, 12-bit sequence = 4M IDs/sec per machine"
Mention clock skew: "We must handle clock going backwards. Strict mode refuses, graceful mode waits."
Real example: "Twitter generates tweet IDs using Snowflake - 64-bit, sortable, billions created daily"
Trade-offs: "UUID v7 is stateless (works anywhere). Snowflake needs assigned machine IDs but is more compact."

Common Pitfalls

⚠️ Using AUTO_INCREMENT in distributed databases: Leads to gaps and coordination overhead

⚠️ UUID v4 for primary keys: Causes index fragmentation and poor query performance

⚠️ Ignoring clock skew: Can generate duplicate IDs if system clock jumps backwards

⚠️ Hardcoding epoch: Choose an epoch far enough in the past to avoid offset issues

⚠️ Not monitoring ID generation: Track sequence overflows and clock skew events

Related Concepts

Consistent Hashing — Distributing keys across nodes
Database Sharding — Horizontal partitioning strategies
Distributed Transactions — ACID in distributed systems
Leader Election — Choosing a coordinator
CAP Theorem — Trade-offs in distributed systems

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Beginner

Heartbeat Protocol

A mechanism for failure detection in distributed systems where nodes send periodic signals to indicate they are still alive. Implementation strategies, timeout tuning, and production patterns.

ReliabilityFailure DetectionDistributed Systems

Pro

Raft Consensus Algorithm

A comprehensive guide to Raft, the consensus algorithm powering Etcd, Consul, and Kubernetes. Leader election, log replication, safety guarantees, and production deployment patterns.

Distributed SystemsAlgorithmsConsensus

Intermediate

ACID vs BASE: Consistency Models

The two philosophies of database transaction handling: Strict guarantees (ACID) versus flexible availability (BASE). Deep dive into isolation levels, transaction anomalies, and hybrid approaches.

DatabaseTransactionsConsistency

Distributed Unique ID Generation

Generating Unique IDs at Scale

Requirements

Approach 1: UUID (Universally Unique Identifier)

UUID v4 (Random)

UUID v7 (Time-Ordered) ⭐

Approach 2: Central Ticket Server (Flickr Strategy)

Approach 3: Twitter Snowflake ⭐⭐⭐ (Recommended)

Snowflake ID Generator

Bit Structure

Complete Python Implementation

Decoding Snowflake IDs

The Clock Synchronization Problem

Problem

Solutions

Solution 1: Refuse to Generate (Strict)

Solution 2: Wait Until Clock Catches Up

Solution 3: Use Sequence Bits as Buffer

Clock Synchronization Best Practices

Production Examples

Twitter Snowflake

Instagram Snowflake

Discord Snowflake

MongoDB ObjectId

Comparison Table

Interview Tips 💡

Common Pitfalls

Related Concepts

About ScaleWiki

Related Articles

Heartbeat Protocol

Raft Consensus Algorithm

ACID vs BASE: Consistency Models