What is Geohashing?

Geohashing is a geocoding method that converts two-dimensional geographic coordinates (latitude and longitude) into a single alphanumeric string. This encoding enables efficient proximity searches, spatial indexing, and location-based queries in databases.

Think of it as giving every location on Earth a unique postal code, where nearby places share similar prefixes.

The Problem: 2D Proximity Search

Finding nearby locations with raw coordinates is computationally expensive:

sql

-- Slow: Requires scanning entire table with range checks on TWO columns
SELECT * FROM restaurants 
WHERE lat BETWEEN 37.7 AND 37.8 
  AND lon BETWEEN -122.5 AND -122.4;

Click to expand code...

Challenges:

Can't create a single index on two columns effectively
Range queries on multiple dimensions are slow
Difficult to cache or shard by location
Complex distance calculations (Haversine formula)

The Solution: Reduce Dimensions

Geohashing solves this by converting (lat, lon) into a single string:

San Francisco: (37.7749, -122.4194) → "9q8yy"

Click to expand code...

Now proximity becomes a simple prefix match:

sql

-- Fast: Single column index, prefix search
SELECT * FROM restaurants 
WHERE geohash LIKE '9q8%';

Click to expand code...

How Geohashing Works

1. Divide the World into a Grid

The algorithm recursively divides the world into smaller rectangles using binary space partitioning:

mermaid

graph TD
    World["World: lat [-90, 90], lon [-180, 180]"]
    World -->|"lon ≥ 0 → 1"| East["Eastern Hemisphere"]
    World -->|"lon < 0 → 0"| West["Western Hemisphere"]
    West -->|"lat ≥ 0 → 1"| NW["Northwest Quadrant"]
    West -->|"lat < 0 → 0"| SW["Southwest Quadrant"]

Click to expand code...

2. Interleave Bits (Z-Order Curve)

Alternate between longitude and latitude bits to create a single binary number:

Example: San Francisco (37.7749, -122.4194)

Step 1: Binary search on longitude [-180, 180]
-122.4194 < 0 → bit 0

Step 2: Binary search on latitude [-90, 90]
37.7749 > 0 → bit 1

Step 3: Longitude [-180, 0]
-122.4194 < -90 → bit 0

Step 4: Latitude [0, 90]
37.7749 < 45 → bit 0

...continue for desired precision...

Binary: 01001110110111001110110110111001

Click to expand code...

3. Base32 Encoding

Convert binary to base32 using this alphabet:

0123456789bcdefghjkmnpqrstuvwxyz
(vowels removed to avoid accidental words)

Click to expand code...

Result: 9q8yy9r (7 characters)

Precision Levels

Geohash length determines the bounding box size:

Length	Cell Width	Cell Height	Example	Use Case
1	≤ 5,000km	≤ 5,000km	`9`	Continent
2	≤ 1,250km	≤ 625km	`9q`	Country/State
3	≤ 156km	≤ 156km	`9q8`	Large City
4	≤ 39km	≤ 19.5km	`9q8y`	City District
5	≤ 4.9km	≤ 4.9km	`9q8yy`	Neighborhood
6	≤ 1.2km	≤ 0.6km	`9q8yy9`	Street
7	≤ 152m	≤ 152m	`9q8yy9r`	Building
8	≤ 38m	≤ 19m	`9q8yy9r3`	House
9	≤ 4.8m	≤ 4.8m	`9q8yy9r3v`	Room

Rule of thumb: Each additional character increases precision by ~5x.

Implementation Examples

Python

python

def encode_geohash(lat, lon, precision=7):
    """Encode lat/lon to geohash string"""
    base32 = "0123456789bcdefghjkmnpqrstuvwxyz"
    lat_range, lon_range = [-90, 90], [-180, 180]
    geohash = []
    bits = 0
    bit = 0
    even = True  # Start with longitude
    
    while len(geohash) < precision:
        if even:  # Longitude
            mid = (lon_range[0] + lon_range[1]) / 2
            if lon > mid:
                bit |= (1 << (4 - bits))
                lon_range[0] = mid
            else:
                lon_range[1] = mid
        else:  # Latitude
            mid = (lat_range[0] + lat_range[1]) / 2
            if lat > mid:
                bit |= (1 << (4 - bits))
                lat_range[0] = mid
            else:
                lat_range[1] = mid
        
        even = not even
        bits += 1
        
        if bits == 5:
            geohash.append(base32[bit])
            bits = 0
            bit = 0
    
    return ''.join(geohash)

# Example usage
print(encode_geohash(37.7749, -122.4194, 7))  # Output: 9q8yy9r

Click to expand code...

JavaScript (Using Library)

javascript

import ngeohash from 'ngeohash';

const location = { lat: 37.7749, lon: -122.4194 };
const hash = ngeohash.encode(location.lat, location.lon, 7);
console.log(hash); // "9q8yy9r"

// Decode back to coordinates
const coords = ngeohash.decode(hash);
console.log(coords); // { latitude: 37.775, longitude: -122.419 }

// Get bounding box
const bbox = ngeohash.decode_bbox(hash);
// [minLat, minLon, maxLat, maxLon]

Click to expand code...

Proximity Search Patterns

1. Prefix Matching

Find all points within the same geohash cell:

sql

SELECT * FROM places 
WHERE geohash LIKE '9q8yy%'
ORDER BY geohash;

Click to expand code...

2. Neighbor Search

Geohash cells have 8 neighbors. To search nearby:

javascript

const neighbors = ngeohash.neighbors('9q8yy');
// ['9q8yv', '9q8yy', '9q8yz', '9q8yw', '9q8yx', ...]

const query = `
  SELECT * FROM places
  WHERE geohash IN (${neighbors.map(n => `'${n}'`).join(',')})
`;

Click to expand code...

3. Expanding Search Radius

Start with current cell, expand to neighbors if needed:

python

def find_nearby(lat, lon, max_distance_km):
    precision = 5  # Start with ~5km cells
    hash = encode_geohash(lat, lon, precision)
    
    while True:
        results = query_db(f"WHERE geohash LIKE '{hash[:precision]}%'")
        if len(results) > MIN_RESULTS or precision == 1:
            return filter_by_distance(results, lat, lon, max_distance_km)
        precision -= 1  # Expand search area

Click to expand code...

Edge Cases and Limitations

1. Geohash Boundary Problem

Points at geohash boundaries may be close but have different prefixes:

Point A: (37.7749, -122.4195) → 9q8yy8
Point B: (37.7750, -122.4194) → 9q8yy9
Distance: ~10 meters, but different hashes!

Click to expand code...

Solution: Always check neighboring cells for proximity searches.

2. Shape Distortion

Geohash cells are rectangles, not squares. Near poles, cells become very narrow:

At equator: Cell is roughly square
At 60° latitude: Cell is 2x wider than tall
At poles: Cell is extremely elongated

3. Dateline and Pole Issues

International Date Line: Longitude wraps from 180° to -180°
Poles: Longitude becomes meaningless at ±90° latitude

Solution: Use alternative systems (S2 Geometry) for global applications.

Geohashing vs. Alternatives

System	Pros	Cons	Best For
Geohash	Simple, human-readable, prefix matching	Cell boundaries, shape distortion	General proximity search
Quadtree	Hierarchical structure	More complex to implement	Varying density areas
S2 Geometry	No edge cases, consistent cell shapes	Complex, not human-readable	Global-scale apps
H3 (Uber)	Hexagonal cells, no gaps	Licensing, specialized	Ride-sharing, spatial analysis

Real-World Use Cases

Uber / Yelp: "Find nearby drivers/restaurants"

javascript

// Find drivers near user (37.7749, -122.4194)
const userHash = geohash.encode(37.7749, -122.4194, 6); // "9q8yy9"
const nearby = geohash.neighbors(userHash);

db.drivers.find({ 
  geohash: { $in: [userHash, ...nearby] },
  available: true 
});

Click to expand code...

MongoDB Geospatial Indexes

javascript

// Create geohash index
db.places.createIndex({ geohash: 1 });

// Query uses index efficiently
db.places.find({ geohash: /^9q8yy/ }).explain();

Click to expand code...

Redis Geo Commands

Redis internally uses geohash for geospatial features:

bash

GEOADD locations 37.7749 -122.4194 "San Francisco"
GEORADIUS locations 37.7 -122.4 10 km

Click to expand code...

Caching Location Data

python

# Cache key uses geohash for locality
cache_key = f"restaurants:{geohash[:5]}"
results = cache.get(cache_key)

if not results:
    results = db.query(f"WHERE geohash LIKE '{geohash[:5]}%'")
    cache.set(cache_key, results, ttl=300)

Click to expand code...

Interview Tips 💡

When discussing geohashing in system design interviews:

Explain the problem first: "We need to efficiently find nearby locations, which is hard with 2D coordinates..."
Justify geohash choice: "Geohashing converts 2D to 1D, enabling prefix-based searches and single-column indexes..."
Mention precision: "We'd use 6-character hashes for city-level searches (~1km cells)..."
Address edge cases: "We'd check neighboring cells since points near boundaries might have different hashes..."
Discuss alternatives: "For a global app, we might consider S2 Geometry to avoid pole distortion..."
Show awareness of libraries: "In production, we'd use proven libraries like ngeohash or Redis GEO commands..."

Related Concepts

Quadtrees — Alternative hierarchical spatial indexing
Consistent Hashing — Similar hashing concept for distributed systems
Database Sharding — Can shard by geohash for geographic data
Bloom Filters — Another space-efficient probabilistic structure
Caching — Geohash enables location-based cache keys

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Pro

Raft Consensus Algorithm

A comprehensive guide to Raft, the consensus algorithm powering Etcd, Consul, and Kubernetes. Leader election, log replication, safety guarantees, and production deployment patterns.

Distributed SystemsAlgorithmsConsensus

Intermediate

Token Bucket vs Leaky Bucket

A comprehensive comparison of two fundamental rate limiting algorithms: Token Bucket allows controlled bursts while Leaky Bucket enforces constant output rates.

System DesignAlgorithmsRate Limiting

Pro

Top-K Heavy Hitters (Streaming)

How to find 'Trending Topics' in massive real-time streams. MapReduce, Count-Min Sketches, and Sliding Windows.

AlgorithmsBig DataStreaming

Geohashing (Location Encoding)

What is Geohashing?

The Problem: 2D Proximity Search

The Solution: Reduce Dimensions

How Geohashing Works

1. Divide the World into a Grid

2. Interleave Bits (Z-Order Curve)

3. Base32 Encoding

Precision Levels

Implementation Examples

Python

JavaScript (Using Library)

Proximity Search Patterns

1. Prefix Matching

2. Neighbor Search

3. Expanding Search Radius

Edge Cases and Limitations

1. Geohash Boundary Problem

2. Shape Distortion

3. Dateline and Pole Issues

Geohashing vs. Alternatives

Real-World Use Cases

Uber / Yelp: "Find nearby drivers/restaurants"

MongoDB Geospatial Indexes

Redis Geo Commands

Caching Location Data

Interview Tips 💡

Related Concepts

About ScaleWiki

Related Articles

Raft Consensus Algorithm

Token Bucket vs Leaky Bucket

Top-K Heavy Hitters (Streaming)