Back to All Concepts
System DesignGeospatialAlgorithmsIntermediate

Geohashing (Location Encoding)

A geocoding system that encodes latitude/longitude coordinates into short alphanumeric strings for efficient proximity searches and spatial indexing.

What is Geohashing?

Geohashing is a geocoding method that converts two-dimensional geographic coordinates (latitude and longitude) into a single alphanumeric string. This encoding enables efficient proximity searches, spatial indexing, and location-based queries in databases.

Think of it as giving every location on Earth a unique postal code, where nearby places share similar prefixes.

The Problem: 2D Proximity Search

Finding nearby locations with raw coordinates is computationally expensive:

sql
-- Slow: Requires scanning entire table with range checks on TWO columns
SELECT * FROM restaurants 
WHERE lat BETWEEN 37.7 AND 37.8 
  AND lon BETWEEN -122.5 AND -122.4;
Click to expand code...

Challenges:

  • Can't create a single index on two columns effectively
  • Range queries on multiple dimensions are slow
  • Difficult to cache or shard by location
  • Complex distance calculations (Haversine formula)

The Solution: Reduce Dimensions

Geohashing solves this by converting (lat, lon) into a single string:

San Francisco: (37.7749, -122.4194) → "9q8yy"
Click to expand code...

Now proximity becomes a simple prefix match:

sql
-- Fast: Single column index, prefix search
SELECT * FROM restaurants 
WHERE geohash LIKE '9q8%';
Click to expand code...

How Geohashing Works

1. Divide the World into a Grid

The algorithm recursively divides the world into smaller rectangles using binary space partitioning:

mermaid
graph TD
    World["World: lat [-90, 90], lon [-180, 180]"]
    World -->|"lon ≥ 0 → 1"| East["Eastern Hemisphere"]
    World -->|"lon < 0 → 0"| West["Western Hemisphere"]
    West -->|"lat ≥ 0 → 1"| NW["Northwest Quadrant"]
    West -->|"lat < 0 → 0"| SW["Southwest Quadrant"]
Click to expand code...

2. Interleave Bits (Z-Order Curve)

Alternate between longitude and latitude bits to create a single binary number:

Example: San Francisco (37.7749, -122.4194)

Step 1: Binary search on longitude [-180, 180]
-122.4194 < 0 → bit 0

Step 2: Binary search on latitude [-90, 90]
37.7749 > 0 → bit 1

Step 3: Longitude [-180, 0]
-122.4194 < -90 → bit 0

Step 4: Latitude [0, 90]
37.7749 < 45 → bit 0

...continue for desired precision...

Binary: 01001110110111001110110110111001
Click to expand code...

3. Base32 Encoding

Convert binary to base32 using this alphabet:

0123456789bcdefghjkmnpqrstuvwxyz
(vowels removed to avoid accidental words)
Click to expand code...

Result: 9q8yy9r (7 characters)

Precision Levels

Geohash length determines the bounding box size:

LengthCell WidthCell HeightExampleUse Case
1≤ 5,000km≤ 5,000km9Continent
2≤ 1,250km≤ 625km9qCountry/State
3≤ 156km≤ 156km9q8Large City
4≤ 39km≤ 19.5km9q8yCity District
5≤ 4.9km≤ 4.9km9q8yyNeighborhood
6≤ 1.2km≤ 0.6km9q8yy9Street
7≤ 152m≤ 152m9q8yy9rBuilding
8≤ 38m≤ 19m9q8yy9r3House
9≤ 4.8m≤ 4.8m9q8yy9r3vRoom

Rule of thumb: Each additional character increases precision by ~5x.

Implementation Examples

Python

python
def encode_geohash(lat, lon, precision=7):
    """Encode lat/lon to geohash string"""
    base32 = "0123456789bcdefghjkmnpqrstuvwxyz"
    lat_range, lon_range = [-90, 90], [-180, 180]
    geohash = []
    bits = 0
    bit = 0
    even = True  # Start with longitude
    
    while len(geohash) < precision:
        if even:  # Longitude
            mid = (lon_range[0] + lon_range[1]) / 2
            if lon > mid:
                bit |= (1 << (4 - bits))
                lon_range[0] = mid
            else:
                lon_range[1] = mid
        else:  # Latitude
            mid = (lat_range[0] + lat_range[1]) / 2
            if lat > mid:
                bit |= (1 << (4 - bits))
                lat_range[0] = mid
            else:
                lat_range[1] = mid
        
        even = not even
        bits += 1
        
        if bits == 5:
            geohash.append(base32[bit])
            bits = 0
            bit = 0
    
    return ''.join(geohash)

# Example usage
print(encode_geohash(37.7749, -122.4194, 7))  # Output: 9q8yy9r
Click to expand code...

JavaScript (Using Library)

javascript
import ngeohash from 'ngeohash';

const location = { lat: 37.7749, lon: -122.4194 };
const hash = ngeohash.encode(location.lat, location.lon, 7);
console.log(hash); // "9q8yy9r"

// Decode back to coordinates
const coords = ngeohash.decode(hash);
console.log(coords); // { latitude: 37.775, longitude: -122.419 }

// Get bounding box
const bbox = ngeohash.decode_bbox(hash);
// [minLat, minLon, maxLat, maxLon]
Click to expand code...

Proximity Search Patterns

1. Prefix Matching

Find all points within the same geohash cell:

sql
SELECT * FROM places 
WHERE geohash LIKE '9q8yy%'
ORDER BY geohash;
Click to expand code...

2. Neighbor Search

Geohash cells have 8 neighbors. To search nearby:

javascript
const neighbors = ngeohash.neighbors('9q8yy');
// ['9q8yv', '9q8yy', '9q8yz', '9q8yw', '9q8yx', ...]

const query = `
  SELECT * FROM places
  WHERE geohash IN (${neighbors.map(n => `'${n}'`).join(',')})
`;
Click to expand code...

3. Expanding Search Radius

Start with current cell, expand to neighbors if needed:

python
def find_nearby(lat, lon, max_distance_km):
    precision = 5  # Start with ~5km cells
    hash = encode_geohash(lat, lon, precision)
    
    while True:
        results = query_db(f"WHERE geohash LIKE '{hash[:precision]}%'")
        if len(results) > MIN_RESULTS or precision == 1:
            return filter_by_distance(results, lat, lon, max_distance_km)
        precision -= 1  # Expand search area
Click to expand code...

Edge Cases and Limitations

1. Geohash Boundary Problem

Points at geohash boundaries may be close but have different prefixes:

Point A: (37.7749, -122.4195) → 9q8yy8
Point B: (37.7750, -122.4194) → 9q8yy9
Distance: ~10 meters, but different hashes!
Click to expand code...

Solution: Always check neighboring cells for proximity searches.

2. Shape Distortion

Geohash cells are rectangles, not squares. Near poles, cells become very narrow:

  • At equator: Cell is roughly square
  • At 60° latitude: Cell is 2x wider than tall
  • At poles: Cell is extremely elongated

3. Dateline and Pole Issues

  • International Date Line: Longitude wraps from 180° to -180°
  • Poles: Longitude becomes meaningless at ±90° latitude

Solution: Use alternative systems (S2 Geometry) for global applications.

Geohashing vs. Alternatives

SystemProsConsBest For
GeohashSimple, human-readable, prefix matchingCell boundaries, shape distortionGeneral proximity search
QuadtreeHierarchical structureMore complex to implementVarying density areas
S2 GeometryNo edge cases, consistent cell shapesComplex, not human-readableGlobal-scale apps
H3 (Uber)Hexagonal cells, no gapsLicensing, specializedRide-sharing, spatial analysis

Real-World Use Cases

Uber / Yelp: "Find nearby drivers/restaurants"

javascript
// Find drivers near user (37.7749, -122.4194)
const userHash = geohash.encode(37.7749, -122.4194, 6); // "9q8yy9"
const nearby = geohash.neighbors(userHash);

db.drivers.find({ 
  geohash: { $in: [userHash, ...nearby] },
  available: true 
});
Click to expand code...

MongoDB Geospatial Indexes

javascript
// Create geohash index
db.places.createIndex({ geohash: 1 });

// Query uses index efficiently
db.places.find({ geohash: /^9q8yy/ }).explain();
Click to expand code...

Redis Geo Commands

Redis internally uses geohash for geospatial features:

bash
GEOADD locations 37.7749 -122.4194 "San Francisco"
GEORADIUS locations 37.7 -122.4 10 km
Click to expand code...

Caching Location Data

python
# Cache key uses geohash for locality
cache_key = f"restaurants:{geohash[:5]}"
results = cache.get(cache_key)

if not results:
    results = db.query(f"WHERE geohash LIKE '{geohash[:5]}%'")
    cache.set(cache_key, results, ttl=300)
Click to expand code...

Interview Tips 💡

When discussing geohashing in system design interviews:

  1. Explain the problem first: "We need to efficiently find nearby locations, which is hard with 2D coordinates..."
  2. Justify geohash choice: "Geohashing converts 2D to 1D, enabling prefix-based searches and single-column indexes..."
  3. Mention precision: "We'd use 6-character hashes for city-level searches (~1km cells)..."
  4. Address edge cases: "We'd check neighboring cells since points near boundaries might have different hashes..."
  5. Discuss alternatives: "For a global app, we might consider S2 Geometry to avoid pole distortion..."
  6. Show awareness of libraries: "In production, we'd use proven libraries like ngeohash or Redis GEO commands..."

Related Concepts

About ScaleWiki

ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.

Read more about our Editorial Guidelines & Authorship.

Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.

Related Articles