What is Geohashing?
Geohashing is a geocoding method that converts two-dimensional geographic coordinates (latitude and longitude) into a single alphanumeric string. This encoding enables efficient proximity searches, spatial indexing, and location-based queries in databases.
Think of it as giving every location on Earth a unique postal code, where nearby places share similar prefixes.
The Problem: 2D Proximity Search
Finding nearby locations with raw coordinates is computationally expensive:
-- Slow: Requires scanning entire table with range checks on TWO columns SELECT * FROM restaurants WHERE lat BETWEEN 37.7 AND 37.8 AND lon BETWEEN -122.5 AND -122.4;
Challenges:
- Can't create a single index on two columns effectively
- Range queries on multiple dimensions are slow
- Difficult to cache or shard by location
- Complex distance calculations (Haversine formula)
The Solution: Reduce Dimensions
Geohashing solves this by converting (lat, lon) into a single string:
San Francisco: (37.7749, -122.4194) → "9q8yy"
Now proximity becomes a simple prefix match:
-- Fast: Single column index, prefix search SELECT * FROM restaurants WHERE geohash LIKE '9q8%';
How Geohashing Works
1. Divide the World into a Grid
The algorithm recursively divides the world into smaller rectangles using binary space partitioning:
graph TD
World["World: lat [-90, 90], lon [-180, 180]"]
World -->|"lon ≥ 0 → 1"| East["Eastern Hemisphere"]
World -->|"lon < 0 → 0"| West["Western Hemisphere"]
West -->|"lat ≥ 0 → 1"| NW["Northwest Quadrant"]
West -->|"lat < 0 → 0"| SW["Southwest Quadrant"]
2. Interleave Bits (Z-Order Curve)
Alternate between longitude and latitude bits to create a single binary number:
Example: San Francisco (37.7749, -122.4194)
Step 1: Binary search on longitude [-180, 180] -122.4194 < 0 → bit 0 Step 2: Binary search on latitude [-90, 90] 37.7749 > 0 → bit 1 Step 3: Longitude [-180, 0] -122.4194 < -90 → bit 0 Step 4: Latitude [0, 90] 37.7749 < 45 → bit 0 ...continue for desired precision... Binary: 01001110110111001110110110111001
3. Base32 Encoding
Convert binary to base32 using this alphabet:
0123456789bcdefghjkmnpqrstuvwxyz (vowels removed to avoid accidental words)
Result: 9q8yy9r (7 characters)
Precision Levels
Geohash length determines the bounding box size:
| Length | Cell Width | Cell Height | Example | Use Case |
|---|---|---|---|---|
| 1 | ≤ 5,000km | ≤ 5,000km | 9 | Continent |
| 2 | ≤ 1,250km | ≤ 625km | 9q | Country/State |
| 3 | ≤ 156km | ≤ 156km | 9q8 | Large City |
| 4 | ≤ 39km | ≤ 19.5km | 9q8y | City District |
| 5 | ≤ 4.9km | ≤ 4.9km | 9q8yy | Neighborhood |
| 6 | ≤ 1.2km | ≤ 0.6km | 9q8yy9 | Street |
| 7 | ≤ 152m | ≤ 152m | 9q8yy9r | Building |
| 8 | ≤ 38m | ≤ 19m | 9q8yy9r3 | House |
| 9 | ≤ 4.8m | ≤ 4.8m | 9q8yy9r3v | Room |
Rule of thumb: Each additional character increases precision by ~5x.
Implementation Examples
Python
def encode_geohash(lat, lon, precision=7):
"""Encode lat/lon to geohash string"""
base32 = "0123456789bcdefghjkmnpqrstuvwxyz"
lat_range, lon_range = [-90, 90], [-180, 180]
geohash = []
bits = 0
bit = 0
even = True # Start with longitude
while len(geohash) < precision:
if even: # Longitude
mid = (lon_range[0] + lon_range[1]) / 2
if lon > mid:
bit |= (1 << (4 - bits))
lon_range[0] = mid
else:
lon_range[1] = mid
else: # Latitude
mid = (lat_range[0] + lat_range[1]) / 2
if lat > mid:
bit |= (1 << (4 - bits))
lat_range[0] = mid
else:
lat_range[1] = mid
even = not even
bits += 1
if bits == 5:
geohash.append(base32[bit])
bits = 0
bit = 0
return ''.join(geohash)
# Example usage
print(encode_geohash(37.7749, -122.4194, 7)) # Output: 9q8yy9r
JavaScript (Using Library)
import ngeohash from 'ngeohash';
const location = { lat: 37.7749, lon: -122.4194 };
const hash = ngeohash.encode(location.lat, location.lon, 7);
console.log(hash); // "9q8yy9r"
// Decode back to coordinates
const coords = ngeohash.decode(hash);
console.log(coords); // { latitude: 37.775, longitude: -122.419 }
// Get bounding box
const bbox = ngeohash.decode_bbox(hash);
// [minLat, minLon, maxLat, maxLon]
Proximity Search Patterns
1. Prefix Matching
Find all points within the same geohash cell:
SELECT * FROM places WHERE geohash LIKE '9q8yy%' ORDER BY geohash;
2. Neighbor Search
Geohash cells have 8 neighbors. To search nearby:
const neighbors = ngeohash.neighbors('9q8yy');
// ['9q8yv', '9q8yy', '9q8yz', '9q8yw', '9q8yx', ...]
const query = `
SELECT * FROM places
WHERE geohash IN (${neighbors.map(n => `'${n}'`).join(',')})
`;
3. Expanding Search Radius
Start with current cell, expand to neighbors if needed:
def find_nearby(lat, lon, max_distance_km):
precision = 5 # Start with ~5km cells
hash = encode_geohash(lat, lon, precision)
while True:
results = query_db(f"WHERE geohash LIKE '{hash[:precision]}%'")
if len(results) > MIN_RESULTS or precision == 1:
return filter_by_distance(results, lat, lon, max_distance_km)
precision -= 1 # Expand search area
Edge Cases and Limitations
1. Geohash Boundary Problem
Points at geohash boundaries may be close but have different prefixes:
Point A: (37.7749, -122.4195) → 9q8yy8 Point B: (37.7750, -122.4194) → 9q8yy9 Distance: ~10 meters, but different hashes!
Solution: Always check neighboring cells for proximity searches.
2. Shape Distortion
Geohash cells are rectangles, not squares. Near poles, cells become very narrow:
- At equator: Cell is roughly square
- At 60° latitude: Cell is 2x wider than tall
- At poles: Cell is extremely elongated
3. Dateline and Pole Issues
- International Date Line: Longitude wraps from 180° to -180°
- Poles: Longitude becomes meaningless at ±90° latitude
Solution: Use alternative systems (S2 Geometry) for global applications.
Geohashing vs. Alternatives
| System | Pros | Cons | Best For |
|---|---|---|---|
| Geohash | Simple, human-readable, prefix matching | Cell boundaries, shape distortion | General proximity search |
| Quadtree | Hierarchical structure | More complex to implement | Varying density areas |
| S2 Geometry | No edge cases, consistent cell shapes | Complex, not human-readable | Global-scale apps |
| H3 (Uber) | Hexagonal cells, no gaps | Licensing, specialized | Ride-sharing, spatial analysis |
Real-World Use Cases
Uber / Yelp: "Find nearby drivers/restaurants"
// Find drivers near user (37.7749, -122.4194)
const userHash = geohash.encode(37.7749, -122.4194, 6); // "9q8yy9"
const nearby = geohash.neighbors(userHash);
db.drivers.find({
geohash: { $in: [userHash, ...nearby] },
available: true
});
MongoDB Geospatial Indexes
// Create geohash index
db.places.createIndex({ geohash: 1 });
// Query uses index efficiently
db.places.find({ geohash: /^9q8yy/ }).explain();
Redis Geo Commands
Redis internally uses geohash for geospatial features:
GEOADD locations 37.7749 -122.4194 "San Francisco" GEORADIUS locations 37.7 -122.4 10 km
Caching Location Data
# Cache key uses geohash for locality
cache_key = f"restaurants:{geohash[:5]}"
results = cache.get(cache_key)
if not results:
results = db.query(f"WHERE geohash LIKE '{geohash[:5]}%'")
cache.set(cache_key, results, ttl=300)
Interview Tips 💡
When discussing geohashing in system design interviews:
- Explain the problem first: "We need to efficiently find nearby locations, which is hard with 2D coordinates..."
- Justify geohash choice: "Geohashing converts 2D to 1D, enabling prefix-based searches and single-column indexes..."
- Mention precision: "We'd use 6-character hashes for city-level searches (~1km cells)..."
- Address edge cases: "We'd check neighboring cells since points near boundaries might have different hashes..."
- Discuss alternatives: "For a global app, we might consider S2 Geometry to avoid pole distortion..."
- Show awareness of libraries: "In production, we'd use proven libraries like ngeohash or Redis GEO commands..."
Related Concepts
- Quadtrees — Alternative hierarchical spatial indexing
- Consistent Hashing — Similar hashing concept for distributed systems
- Database Sharding — Can shard by geohash for geographic data
- Bloom Filters — Another space-efficient probabilistic structure
- Caching — Geohash enables location-based cache keys
About ScaleWiki
ScaleWiki is an interactive educational platform dedicated to demystifying distributed systems, software architecture, and system design. Our mission is to provide high-quality, technically accurate resources for software engineers preparing for interviews or solving complex scaling challenges in production.
Read more about our Editorial Guidelines & Authorship.
Educational Disclaimer: The architectural patterns and system designs discussed in this article are based on common industry practices, technical whitepapers, and public engineering blogs. Actual implementations in enterprise environments may vary significantly based on specific product requirements, legacy constraints, and evolving technologies.
Related Articles
Raft Consensus Algorithm
A comprehensive guide to Raft, the consensus algorithm powering Etcd, Consul, and Kubernetes. Leader election, log replication, safety guarantees, and production deployment patterns.
Token Bucket vs Leaky Bucket
A comprehensive comparison of two fundamental rate limiting algorithms: Token Bucket allows controlled bursts while Leaky Bucket enforces constant output rates.
Top-K Heavy Hitters (Streaming)
How to find 'Trending Topics' in massive real-time streams. MapReduce, Count-Min Sketches, and Sliding Windows.