Location-based search results with DynamoDB and Geohash

Searching for the nearest or farthest location is easy thanks to DynamoDB and a small geocoding NPM package

Many datasets now include geospatial information — especially if you are working with mobile apps or Google Maps. Geospatial data is a fancy way of describing items containing a latitude and longitude, but handling queries based on the nearest, furthest or within a certain distance has some complexity.

Why is this difficult? Imagine you have 10 retail stores and a customer wants to know the nearest location. The obvious way is to find the location of the customer and then calculate the distance between that location and each of the stores. What’s the problem?

Well, there are two problems —the first is that the previous calculation assumes the two points are on a flat surface and the world isn’t flat (really, it isn’t). You have to factor in the roundness of the rock we live on, which brings us to the Haversine formula.

While the first problem will lead to calculation errors, the second problem is much more problematic. As the number of retail stores increases, the speed of your search slows down — the dreaded O(n) — because you must compare the user’s location against every store. And what if the user is moving so the original location updates frequently?

If you broaden the problem to “find the nearest coffee shop”, our first solution would require us to compare the user’s location to every coffee shop on the planet. So we need a much better way of solving the question, and preferably one that can manage global scale. Basically, something like the way Google Maps does it.

The Power of Geohash

Geohashes are a little complicated — but the key thing to know is that it divides the world into a number of squares in a grid. By doing this, we can eliminate most of the data upfront and only focus on the square our potential targets are in. It’s a pretty elegant solution.

  • The hash is up to 12 characters in length — the longer the hash, the smaller the square it references.
  • The first character of the hash identifies one of 32 cells in the grid, roughly 5000km x 5000km on the planet.
  • The second character identifies one of the 32 squares in the first cell, so the resolution of the two characters combined is now 1250km x 1250km.
  • This continues until the twelfth character, which can identify an area as small as just a couple of square inches on Earth.
For a detailed explanation, see https://www.movable-type.co.uk/scripts/geohash.html.

If you use a hash that is too long — in other words, the squares are too small — you will end up searching more squares for the results. If you use a hash that’s too short, you will end up returning too many items in the square and your search won’t be very fast. So you need to know the resolution of your search before selecting the length of the hash.

That might sound difficult — but it’s easier than you’d think. For our example of finding retail stores and coffee shops, it’s not necessary to have a 2-inch grid of the planet. We can resolve 1–10 kilometer searches with 5–7 characters.

The Geo Library for Amazon DynamoDB

There’s an AWS blog post from 2013 that details the Geo Library but it’s a Java-specific solution that needs a server. More recently there’s an NPM package that ports this work across to Node and can be used within a serverless architecture.

To test this out, I found a list of all the Starbucks locations in the US and then discovered a link where someone had turned this into JSON. Following the documentation in the NPM package, I cobbled together this script to configure the DynamoDB table and load it with data.

Let me walk you through the highlights.

First, let’s create the table which spins up a new DynamoDB table with the partitionKey and indexes needed to make the search work.

const ddbGeo = require('dynamodb-geo')

const config = new ddbGeo.GeoDataManagerConfiguration(ddb, 'yourTableName')
config.hashKeyLength = 5
const myGeoTableManager = new ddbGeo.GeoDataManager(config)
const setupTable = () => {

const createTableInput = ddbGeo.GeoTableUtil.getCreateTableRequest(config)

createTableInput.ProvisionedThroughput.ReadCapacityUnits = 5
console.dir(createTableInput, { depth: null })
  ddb.createTable(createTableInput).promise()
.then(function () { return ddb.waitFor('tableExists', { TableName: config.tableName }).promise() })
.then(function () { console.log('Table created and ready!') })
}

Next we load up the data:

const loadTable = () => {

const data = require('../data/starbucks_us_locations')

const putPointInputs = data.map(function (location) {
return {
RangeKeyValue: { S: uuid.v4() }
GeoPoint: {
latitude: location.position.lat,
longitude: location.position.lng
},
PutItemInput: {
Item: {
name: { S: location.name },
address: { S: location.address }
}
}
}
})

const BATCH_SIZE = 25
const WAIT_BETWEEN_BATCHES_MS = 1000
let currentBatch = 1

function resumeWriting() {
if (putPointInputs.length === 0) return Promise.resolve()
const thisBatch = []
for (let i = 0, itemToAdd = null; i < BATCH_SIZE && (itemToAdd = putPointInputs.shift()); i++) {
thisBatch.push(itemToAdd)
}
console.log('Writing batch ' + (currentBatch++) + '/' + Math.ceil(data.length / BATCH_SIZE))

return myGeoTableManager.batchWritePoints(thisBatch).promise()
.then(function () {
return new Promise(function (resolve) {
setInterval(resolve,WAIT_BETWEEN_BATCHES_MS)
})
})
.then(function () {
return resumeWriting()
})
}
return resumeWriting().catch(function (error) {
console.warn(error)
})
}

This loads the Starbucks locations from the json file, creates an array of items to insert into the tables, and uploads into DynamoDB in batches of 25 items. There is a delay introduced between each batch to slow down the insertion process, and reduce the burn on the Write Capacity Units (WCUs).

Once uploaded, you can see all the locations together with their geohash values and hash keys, which were calculated by the library automatically:

With the data loaded, querying is simple — we just supply the latitude and longitude of the search location, together with the radius of the search area:

const AWS = require('aws-sdk')
AWS.config.update({region: 'us-east-1'})

const ddb = new AWS.DynamoDB()
const ddbGeo = require('dynamodb-geo')

const config = new ddbGeo.GeoDataManagerConfiguration(ddb, 'yourTableName')
config.hashKeyLength = 5

const myGeoTableManager = new ddbGeo.GeoDataManager(config)

myGeoTableManager.queryRadius({
RadiusInMeter: 1000,
CenterPoint: {
latitude: 40.7769099,
longitude: -73.9822532
}
})
.then((locations) => console.log(locations))

Let’s try it out!

I built a simple front-end that uses the Starbucks data list to show the nearest Starbucks within 1 kilometer — it defaults to New York City since this looks like the most densely packed location for their stores.

If you click around the map, it fetches the list from DynamoDB based upon the new center of the map:

Visit https://caffeinate.jbes.dev/

Overall, the advantage of using DynamoDB for this solution is the extremely steady performance under load. And with on-demand capacity now available for DynamoDB, it can handle spiky traffic without you needing to manage provisioned throughput.

Let me know what you think in the comments below!


Location-based search results with DynamoDB and Geohash was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story.