Back to All Concepts
System DesignDatabaseDistributed SystemsAlgorithmsIntermediate

Distributed Unique ID Generation

How to generate unique IDs in a distributed system without coordination. Twitter Snowflake, UUID v4 vs v7, Clock skew issues, and production implementations.

Generating Unique IDs at Scale

In a single-server setup, AUTO_INCREMENT is perfect. In a distributed system with 100 database shards, it fails immediately.

The Challenge: How do you generate globally unique, sortable, compact IDs across hundreds of servers without a centralized coordinator?

Requirements

A good distributed ID system must satisfy:

  • Globally Unique: No two objects can ever share an ID (Collisions = Data Corruption).
  • Sortable: IDs should be time-ordered so ID1 < ID2 if Event1 happened before Event2.
  • Compact: 64-bit integers are better for database indexing than 128-bit strings.
  • High Availability: ID generation cannot be a single point of failure (SPOF).
  • High Performance: Must generate millions of IDs per second across the cluster.

Approach 1: UUID (Universally Unique Identifier)

A 128-bit number usually displayed as hex string: 123e4567-e89b-12d3-a456-426614174000.

UUID v4 (Random)

Completely random bits based on cryptographic randomness.

python
import uuid

id = uuid.uuid4()
print(id)  # e.g., '3c9f1b7a-4f3e-4e1a-9c2b-8d7e6f5a4b3c'
Click to expand code...

Pros:

  • Can be generated offline by the client
  • Zero coordination needed
  • Extremely low collision probability (~0% for practical purposes)

Cons:

  • Not sortable: Random ordering
  • Database unfriendly: Indexing random UUIDs causes B-Tree page fragmentation
  • Large size: 128 bits = 16 bytes (vs 8 bytes for int64)

Performance Impact:

sql
-- Sequential IDs: New rows go to the end (fast)
INSERT INTO posts (id, ...) VALUES (1001, ...), (1002, ...);

-- Random UUIDs: New rows scattered everywhere (slow, fragmentation)
INSERT INTO posts (id, ...) VALUES (uuid(), ...), (uuid(), ...);
Click to expand code...

UUID v7 (Time-Ordered) ⭐

Modern UUID standard (RFC 9562) that embeds timestamp for sortability.

Structure:

  • 48 bits: Unix timestamp in milliseconds
  • 12 bits: Randomized sequence
  • 62 bits: Random data
  • 6 bits: Version and variant markers
javascript
// JavaScript implementation
function uuidv7() {
  const timestamp = Date.now();
  const timestampHex = timestamp.toString(16).padStart(12, '0');
  const randomHex = Math.random().toString(16).substr(2, 16);
  
  return `${timestampHex.substr(0, 8)}-${timestampHex.substr(8, 4)}-7${randomHex.substr(0, 3)}-${randomHex.substr(3, 4)}-${randomHex.substr(7)}`;
}

console.log(uuidv7());  // '018d3e2a-1b2c-7d4e-a5f6-b7c8d9e0f1a2'
Click to expand code...

Pros:

  • Sortable! Preserves insertion order
  • Database friendly (sequential writes)
  • No coordination needed

Cons:

  • Still 128-bit (2x larger than int64)
  • Requires storing as string or binary in most databases

Use Case: Ideal for databases that don't support custom ID generation but need sortability.


Approach 2: Central Ticket Server (Flickr Strategy)

Use a centralized database just to hand out IDs.

MySQL Implementation:

sql
CREATE TABLE Tickets64 (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  stub CHAR(1) NOT NULL DEFAULT ''
);

-- Get next ID
REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();
Click to expand code...

Python Client:

python
def get_next_id():
    cursor.execute("REPLACE INTO Tickets64 (stub) VALUES ('a')")
    cursor.execute("SELECT LAST_INSERT_ID()")
    return cursor.fetchone()[0]
Click to expand code...

Pros:

  • Simple to implement
  • Guarantees sequential numbers
  • Works with existing infrastructure

Cons:

  • Single Point of Failure: If this DB dies, you can't create anything
  • High latency: Network roundtrip for every ID (slow!)
  • Limited throughput: ~10k IDs/sec max (database bottleneck)

Multi-Master Variant:

sql
-- Server 1: IDs 1, 3, 5, 7, 9...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 1;

-- Server 2: IDs 2, 4, 6, 8, 10...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 2;
Click to expand code...

Reduces SPOF but IDs are no longer sequential.


Approach 3: Twitter Snowflake ⭐⭐⭐ (Recommended)

Snowflake ID Generator

Timestamp (Epoch) (41 bits)
1779023909678
...
Machine ID (10 bits)
0101010101
Sequence (12 bits)
0000
17790239096780000
64-bit INT
Recently Generated
Click Generate...

A 64-bit integer composed of bit-packed fields. This is the industry standard.

Bit Structure

ComponentBitsRangeDescription
Sign Bit1Always 0Keeps ID positive
Timestamp410-2^41Milliseconds since custom epoch (69 years)
Datacenter ID50-31Identifies physical datacenter
Machine ID50-31Identifies server within datacenter
Sequence120-4095Incrementing counter per millisecond

Total: 64 bits (fits in a BIGINT)

Capacity:

  • Per Machine: 4,096 IDs per millisecond = 4,096,000 IDs/sec
  • Per Datacenter: 32 machines × 4M = 131M IDs/sec
  • Total Cluster: 32 datacenters × 131M = 4.2 billion IDs/sec

Complete Python Implementation

python
import time
import threading

class SnowflakeIDGenerator:
    def __init__(self, datacenter_id, machine_id, custom_epoch=1609459200000):
        # Validation
        if datacenter_id < 0 or datacenter_id > 31:
            raise ValueError("Datacenter ID must be 0-31")
        if machine_id < 0 or machine_id > 31:
            raise ValueError("Machine ID must be 0-31")
        
        self.datacenter_id = datacenter_id
        self.machine_id = machine_id
        self.custom_epoch = custom_epoch  # Jan 1, 2021
        
        self.sequence = 0
        self.last_timestamp = -1
        self.lock = threading.Lock()
    
    def _current_millis(self):
        return int(time.time() * 1000)
    
    def _wait_next_millis(self, last_timestamp):
        timestamp = self._current_millis()
        while timestamp <= last_timestamp:
            timestamp = self._current_millis()
        return timestamp
    
    def generate_id(self):
        with self.lock:
            timestamp = self._current_millis()
            
            # Clock moved backwards!
            if timestamp < self.last_timestamp:
                raise Exception(f"Clock moved backwards. Refusing to generate ID for {self.last_timestamp - timestamp}ms")
            
            # Same millisecond - increment sequence
            if timestamp == self.last_timestamp:
                self.sequence = (self.sequence + 1) & 0xFFF  # 12-bit mask
                
                # Sequence overflow - wait for next millisecond
                if self.sequence == 0:
                    timestamp = self._wait_next_millis(self.last_timestamp)
            else:
                # New millisecond - reset sequence
                self.sequence = 0
            
            self.last_timestamp = timestamp
            
            # Build the ID
            id = ((timestamp - self.custom_epoch) << 22) | \
                 (self.datacenter_id << 17) | \
                 (self.machine_id << 12) | \
                 self.sequence
            
            return id

# Usage
generator = SnowflakeIDGenerator(datacenter_id=1, machine_id=3)

for _ in range(5):
    print(generator.generate_id())
    
# Output (example):
# 7045391791689121792
# 7045391791689121793
# 7045391791689121794
# 7045391791689121795
# 7045391791689121796
Click to expand code...

Decoding Snowflake IDs

python
def decode_snowflake(id, custom_epoch=1609459200000):
    timestamp = ((id >> 22) + custom_epoch) / 1000
    datacenter = (id >> 17) & 0x1F  # 5-bit mask
    machine = (id >> 12) & 0x1F
    sequence = id & 0xFFF
    
    import datetime
    dt = datetime.datetime.fromtimestamp(timestamp)
    
    return {
        'timestamp': dt.isoformat(),
        'datacenter_id': datacenter,
        'machine_id': machine,
        'sequence': sequence
    }

print(decode_snowflake(7045391791689121795))
# {
#   'timestamp': '2024-01-15T10:23:45.123',
#   'datacenter_id': 1,
#   'machine_id': 3,
#   'sequence': 3
# }
Click to expand code...

The Clock Synchronization Problem

Snowflake relies on wall-clock time. What if the server's clock moves backward (NTP sync, leap seconds)?

Problem

10:00:00.100 → Generate ID: 123456
10:00:00.095 ← Clock goes backwards!
10:00:00.095 → Generate ID: 123455 (COLLISION!)
Click to expand code...

Solutions

Solution 1: Refuse to Generate (Strict)

python
if timestamp < self.last_timestamp:
    raise Exception(f"Clock moved backwards. Refusing to generate ID")
Click to expand code...

Pros: Guarantees uniqueness
Cons: Service downtime during clock skew

Solution 2: Wait Until Clock Catches Up

python
if timestamp < self.last_timestamp:
    # Wait up to 5ms for clock to catch up
    for _ in range(5):
        time.sleep(0.001)
        timestamp = self._current_millis()
        if timestamp >= self.last_timestamp:
            break
    else:
        raise Exception("Clock skew too large")
Click to expand code...

Solution 3: Use Sequence Bits as Buffer

python
if timestamp < self.last_timestamp:
    # Use extra sequence bits for small skews (< 1ms)
    if self.last_timestamp - timestamp < 10:
        timestamp = self.last_timestamp
        # Continue with sequence increment
    else:
        raise Exception("Clock skew too large")
Click to expand code...

Clock Synchronization Best Practices

  1. Use NTP or PTP: Keep clocks synchronized within 100ms
  2. Monitor clock drift: Alert if drift > 50ms
  3. Gradual adjustments: Use ntpd instead of ntpdate (gradual vs sudden)
  4. Disable leap second smearing: Or handle explicitly

Production Examples

Twitter Snowflake

  • Original implementation: Powers tweet IDs, user IDs, DM IDs
  • Capacity: 10,000+ IDs per second per machine
  • Scale: Billions of tweets since 2010

Example Tweet ID:

1749823749823700000
Click to expand code...

This encodes:

  • When it was tweeted
  • Which datacenter
  • Which server
  • Sequence within that millisecond

Instagram Snowflake

Instagram uses a modified Snowflake with:

  • 41 bits: timestamp
  • 13 bits: shard ID (8,192 database shards)
  • 10 bits: sequence (1,024 per ms per shard)

Discord Snowflake

Discord uses Snowflake for message IDs, channel IDs, user IDs:

javascript
// Extract timestamp from Discord message ID
const timestamp = (messageId >> 22) + 1420070400000;
const date = new Date(timestamp);
Click to expand code...

MongoDB ObjectId

Similar concept but 96-bit:

  • 4 bytes: Unix timestamp
  • 5 bytes: Random value
  • 3 bytes: Incrementing counter
javascript
const ObjectId = require('mongodb').ObjectId;
const id = new ObjectId();
console.log(id.getTimestamp());  // Embedded timestamp!
Click to expand code...

Comparison Table

MethodSizeSortable?Coordination?IDs/secUse Case
UUID v4128-bit❌ NoNoneUnlimitedRequest IDs, temp keys
UUID v7128-bit✅ YesNoneUnlimitedGeneral purpose, no custom infra
Ticket Server64-bit✅ YesHigh (Network)~10KLegacy systems
Snowflake64-bit✅ YesNone (Internal)MillionsPrimary keys at scale

Interview Tips 💡

When discussing ID generation in interviews:

  • Start with requirements: "Are IDs time-sortable? What's the expected QPS?"
  • Compare approaches: "UUID v7 is simple but 128-bit. Snowflake is 64-bit and sortable but needs clock sync."
  • Explain Snowflake: "41-bit timestamp, 10-bit worker ID, 12-bit sequence = 4M IDs/sec per machine"
  • Mention clock skew: "We must handle clock going backwards. Strict mode refuses, graceful mode waits."
  • Real example: "Twitter generates tweet IDs using Snowflake - 64-bit, sortable, billions created daily"
  • Trade-offs: "UUID v7 is stateless (works anywhere). Snowflake needs assigned machine IDs but is more compact."

Common Pitfalls

⚠️ Using AUTO_INCREMENT in distributed databases: Leads to gaps and coordination overhead

⚠️ UUID v4 for primary keys: Causes index fragmentation and poor query performance

⚠️ Ignoring clock skew: Can generate duplicate IDs if system clock jumps backwards

⚠️ Hardcoding epoch: Choose an epoch far enough in the past to avoid offset issues

⚠️ Not monitoring ID generation: Track sequence overflows and clock skew events


Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles