A scenario arose recently that required me to be able to find all of the cities within a given radius from a geographic center-point (latitude / longitude).
At first I was tempted to utilize an existing mapping service to handle this for me, but most of those services impose limits on the number of requests that you can send in a given time period. In this instance, I needed something that was not going to be subject to those limits.
There are a lot of different ways to solve this problem. In this post, I am going to cover how to do so using SQL Server 2012+.
The first step is to locate a list of cities and their latitude/longitude. A good source for this is GeoNames.org. There are a wide variety of data downloads available. For my purposes, I am going to grab Cities5000.zip. This archive contains a list of all of the cities with a population of 5,000 or more.
There is a lot of really good documentation about what the download contains, but for ease of reference, I'll reproduce it here:
geonameid : integer id of record in geonames database name : name of geographical point (utf8) varchar(200) asciiname : name of geographical point in plain ascii characters, varchar(200) alternatenames : alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000) latitude : latitude in decimal degrees (wgs84) longitude : longitude in decimal degrees (wgs84) feature class : see http://www.geonames.org/export/codes.html, char(1) feature code : see http://www.geonames.org/export/codes.html, varchar(10) country code : ISO-3166 2-letter country code, 2 characters cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters admin1 code : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20) admin2 code : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) admin3 code : code for third level administrative division, varchar(20) admin4 code : code for fourth level administrative division, varchar(20) population : bigint (8 byte int) elevation : in meters, integer dem : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat. timezone : the iana timezone id (see file timeZone.txt) varchar(40) modification date : date of last modification in yyyy-MM-dd format
You can pull all or part of the data into SQL. I am going to pull in all the columns. I don't need all of them at the moment, but they could be useful later on.
The following will create a table for the city data and use
bulk import to populate it.
Note: this assumes that you have unzipped the data to c:\temp
First create the table
CREATE TABLE Cities ( geonameid int NOT NULL PRIMARY KEY, [name] nvarchar(200) NOT NULL, asciiname nvarchar(200) NOT NULL, alternatenames nvarchar(max), latitude numeric(18,15), longitude numeric(18,15), feature_class char(1), feature_code varchar(10), country_code char(2), cc2 nvarchar(200), admin1_code nvarchar(20), admin2_code nvarchar(80), admin3_code nvarchar(20), admin4_code nvarchar(20), [population] decimal, elevation int, dem int, timezone nvarchar(40), modification_date_tmp nvarchar(50) )
Now, import the data
BULK INSERT Cities FROM 'c:\temp\cities5000.txt' WITH ( FIELDTERMINATOR = '\t', ROWTERMINATOR = '\n' );
Ok. We have our raw data. Now we need to let SQL Server know how to plot each of these cities on a geographic plane. To do this, we are going to use the
geography spatial data type that is available in SQL Server.
geography is actually a .Net CLR data type that is specifically tailored to work with latitude and longitude. You can read more about
ALTER TABLE Cities ADD Point AS CONVERT([geography], CASE WHEN [Latitude]<>(0) AND [Longitude]<>(0) THEN Geography::Point([Latitude],[Longitude],(4326)) END,(0))
Let's break down what is happening above.
First, I'm adding a
Point as a column to
Cities. In this instance, I'm creating a computed column. As
Longitude is updated, this will automatically update the value for
Point. If you would prefer, you can create
Point as a
geography column and run an
update statement to do this calculation once. (just remember to re-run
update if you re-import the data!).
CONVERT to turn the data into a
I'm wrapping the data in a
CASE statement to skip the conversion if
Longitude aren't set properly.
Finally, I'm converting the
Longitude into a geographical point using
:: is SQL Server syntax for calling the
Point method on the
geography data type. Think of it like a static method on a class.
In this case,
Point takes a latitude, a longitude and something called a SRID (Spatial Reference Identifier). In this case, I'm using 4326 which gives us the standard -180:180/-90:90 that you're probably used to seeing.
If you open SQL Server Management Studio (SSMS) and select all the records from cities, you should see a new tab called "Spatial Results". Clicking that will give you something similar this.
Now then, what if we want to use this data to find all of the cities within 5 miles of Manhattan?
First, let's grab the
Point from Manhattan.
SELECT Name, Point FROM cities WHERE name = 'Manhattan' AND admin1_code = 'NY'
The result should resemble the following:
That hex string on the right is the text representation of Manhattan's center-point.
To draw a 25 mile circle around that center-point we are going to use another built-in method called STBuffer.
STBuffer takes a single argument of Distance which is defined in meters. Since we are trying to work in miles, we are going to have to do a conversion.
SELECT Name, Point, Point.STBuffer(5 * 1609.344) as SearchArea FROM cities WHERE name = 'Manhattan' AND admin1_code = 'NY'
The 1609.344 in the query above is the approximate meters per mile.
This time if you look at the Spatial Results (and select
SearchArea from the dropdown on the right) you'll see something like this:
Now for the final step.
DECLARE @SearchArea GEOGRAPHY SELECT @SearchArea = Point.STBuffer(5*1609.344) FROM cities WHERE name = 'Manhattan' AND admin1_code = 'NY' SELECT Name, Point FROM Cities WHERE [point].STIntersects(@SearchArea) = 1
In the query above, I have saved the search area we defined to a variable so that it is easier to use later on.
The heavy lifting here is being done by STIntersects. Like STBuffer, STIntersects is a method available off of the
geography type. In this case, it takes another
geography type as its argument. STIntersects will determine if the two
geography instances cross over one another.
It is important to notice that this ellipse from the previous step is solid and not outlined. Since our goal is to find everything contained inside of the radius, if we only had the outline of a circle, then the only place that it would intersect would be along the exact outside of the circle. In all likelihood we would not get any results at all!
What did we get for all of our hard effort? Here are all the cities (with a population of 5,000 or more) within 5 miles of Manhattan's center-point.
West New York
Long Island City
This barely scratches the surface of what is available as part of the Geography/Geometry functionality inside of SQL Server. For a more complete list of all of the OGC (STXXXX) methods refer to the MSDN article