Nibble: a small piece of food bitten off. In computing: half a byte of information. Every nibble explains a computing science or software engineering idea or system in five minutes.
Geohashes encode longitude and latitude in a simple URL-safe string for geohash.org. They're interesting because two geohashes that share the same prefix are pretty close together. For example, the Alexanderplatz in Berlin has geohash u33dc1r4. Nearby Brandenburger Tor has geohash u33db2jx - note the shared prefix u33d
.
As an aside, I have not programmed in Go before and just ran hello world. I wasn't going to leave that alliteration on the table!
What is a geohash?
A geohash encodes a latitude and longitude into a string of digits and letters. It’s base-32 encoded so uses 32 possible characters: 10 digits 0-9 and 22 letters a-z excluding a, i, l, and o.
For example, the geohash sunny decodes to 23.7 42.5, or N 23°42.000' E 42°30.000' in Saudi Arabia. Most geohashes aren't such memorable words, but if you're bored you can check where words or names are geolocated. My name is near the coast of Madagascar.
For readers who are geographically challenged, like me: when expressed as decimals, positive latitude is north of the Equator, and negative to the south. Latitude ranges from -90° to 90° at the south and north pole respectively. Positive longitude is east of the prime meridian (through Greenwich, UK), and negative is west. Longitude ranges from -180° to 180°. Good ol' Encyclopaedia Britannica has some nice visualizations.
As we'll see, by construction geohashes have a few interesting properties.
Two geohashes that share a common prefix have coordinates that are close together.
The longer the geohash, the more accurate it is. Longer geohashes describe smaller regions.
A geohash that adds characters to another geohash, is a sub-region of that geohash.
On the other hand, it is NOT necessarily the case that two nearby locations share a geohash prefix. Around the Equator, Prime Meridian, and the Poles, nearby locations can have completely different latitudes and longitudes, which result in different geohashes.
Z-order curves
Geohashes map latitude and longitude on a z-order curve. A z-order curve is a general mapping from a 2-dimensional coordinate to a one-dimensional coordinate - a point on a line. The mapping preserves locality: if two 2D coordinates are near, their 1D coordinates are also near. Z-order curves are a particular type of space-filling curve: a curve whose range contains an entire 2-dimensional square. In simple terms: a contiguous line through a table that passes through all the table's cells once.
Z-order curves are simple to construct: interleave the bits of the x and y coordinates to get the 1D coordinate. If you lay out your axes right, that creates a recursive Z-shaped curve in the 2-dimensional plane:
The table has the x coordinates 0-3 horizontally at the top, and the y coordinates vertically on the left. The dashed line shows the resulting z-order, starting from the top left.
Z-order curves can be generalized to more than two dimensions and are useful when multidimensional data needs to be laid out sequentially, typically for performance reasons. Examples are:
Database indexes for locations. Allows searching for nearby locations by looking for similar prefixes.
In-memory layout of texture maps in GPUs. Optimizes memory layout to reduce cache misses.
Efficient matrix multiplication, to reduce cache misses.
Decoding a geohash
First, let's try to decode a given geohash to latitude and longitude.
The easy part is finding the bit representation of a geohash string like "gbsuv". Since geohash uses base-32 encoding, each character corresponds to 5 bits. All we need is a constant containing the allowable characters. The index of the character in the string contains the 5 bits we're interested in:
const base32 = "0123456789bcdefghjkmnpqrstuvwxyz"
c := strings.IndexRune(base32, letter)
Now how to process those bits into latitude and longitude?
First, we know that a geohash is a z-order curve, so its bits are longitude and latitude interleaved. The even bits in a geohash are longitude, the odd bits are latitude.
Second, a geohash does not describe an exact location. The coordinate geohash.org returns is really the mid-point of a "rectangular" region. Rectangular is in scare quotes because it's a rectangle projected on a sphere - at the poles, it's a triangle! The region can be described by a minimum and maximum longitude and latitude, for the two corners of the “rectangle”.
Here's an animation of how 10 bits converge to a smaller and smaller region (in orange) - each bit moves one of the corners:
The empty geohash represents the entire Earth. The first bit determines whether the location is in the eastern or western hemisphere. The second bit determines whether the region is in the northern or southern hemisphere - after two bits we know in which quadrant of the Earth the location is. Each bit approximately halves the region further, along east-west or north-south.
The longer the geohash, the smaller the region. To calibrate your intuition, a geohash of 8 characters describes a region of 40m by 40m.
Here's the Go code to decode a geohash:
const latMin, latMax = -90.0, 90.0
const lonMin, lonMax = -180.0, 180.0
func decode(geohash string) (lat, lon float64) {
curMinLat, curMaxLat := latMin, latMax
curMinLon, curMaxLon := lonMin, lonMax
bitIsLongitude := true
// loop through each character in the geohash from left to right
for _, character := range geohash {
// decode the base32 representation
nibblebit := strings.IndexRune(base32, character)
// each character represents 5 bits
for b := 4; b >= 0; b-- {
// get the bit at position b
bit := nibblebit & (1 << uint(b))
if bitIsLongitude {
if bit == 0 {
curMaxLon = (curMinLon + curMaxLon) / 2
} else {
curMinLon = (curMinLon + curMaxLon) / 2
}
} else {
if bit == 0 {
curMaxLat = (curMinLat + curMaxLat) / 2
} else {
curMinLat = (curMinLat + curMaxLat) / 2
}
}
bitIsLongitude = !bitIsLongitude
}
}
// we now have a region - return the midpoint
return (curMinLat + curMaxLat) / 2, (curMinLon + curMaxLon) / 2
}
Encoding a geohash
Encoding a geohash from a latitude and longitude works similarly.
Again we start with a region that encompasses the entire Earth. First, we determine whether the longitude is in the western or eastern hemisphere, and add a 0 or 1, respectively. Then we look at latitude to determine the next bit, halving the region again.
func encode(lat, lon float64, precision int) string {
var geohash []rune
if precision > 12 {
precision = 12
}
curMinLat, curMaxLat := latMin, latMax
curMinLon, curMaxLon := lonMin, lonMax
bitIsLongitude := true
nibblebit_idx := 4
nibblebit := 0
for len(geohash) < precision {
if bitIsLongitude {
mid := (curMinLon + curMaxLon) / 2
if lon > mid {
nibblebit |= 1 << uint(nibblebit_idx)
curMinLon = mid
} else {
curMaxLon = mid
}
} else {
mid := (curMinLat + curMaxLat) / 2
if lat > mid {
nibblebit |= 1 << uint(nibblebit_idx)
curMinLat = mid
} else {
curMaxLat = mid
}
}
bitIsLongitude = !bitIsLongitude
nibblebit_idx--
if nibblebit_idx == -1 {
// we have a full base32 character
geohash = append(geohash, rune(base32[nibblebit]))
nibblebit_idx = 4
nibblebit = 0
}
}
return string(geohash)
}
A speed bump was that I didn't realize when to stop - every region can be divided further in half forever (until you hit the limit of floating point precision). To fix that, I've added a precision
argument to specify the number of characters the resulting geohash should have.
That's a wrap
That concludes geohashes and z-order curves. Z-order curves will be useful to create quadtrees, a data structure with further geographical and computer geometry applications. I'll discuss quadtrees in the next nibble, so stay tuned.
A note about Go: I understand why it's popular. It immediately felt familiar and avoids the ridiculous explosion of features some other languages have accumulated. I also like that it's garbage collected. GC is perfect for loads of applications, take it from John Carmack!
References and Acknowledgements
Encode and decode geohashes in the browser and reference JavaScript code
GitHub's CoPilot generated a lot of the Go code. Memorable instances: I typed
base32
and it filled in the constants. I typeddecode(geohash
and it filled in quite a bit of thedecode
function, including the outline of the algorithm. The code did have several bugs and issues.Illustrations created with Manim community.