Geo-enabled (or location-enabled) applications are very common nowadays and many of them use MySQL spatial functions. The common tasks for such applications are:

  • Find all points of interests (i.e. coffee shops) around (i.e. a 10-mile radius) the given location (latitude and longitude). For example, we want to show this to a user of the mobile application when we know his/her approximate location. (This usually means we need to calculate a distance between 2 points on Earth).
  • Find a ZIP code (U.S. Postal address) for the given location or determine if this location is within the given area. Another example is to find a school district for the given property.

MySQL spatial functions were originally included (implementation follows a subset of OpenGIS standard). However, there are 2 major limitations of MySQL spatial functions that can make it difficult to use those functions in geo-enabled applications:

  • Distance between 2 points.  The “distance” function was not implemented before MySQL 5.6. In addition (even in MySQL 5.6), all calculations (e.g. distance between 2 points) are done using a planar coordinate system (Euclidean geometry). For the distance between 2 points on Earth, this can produce incorrect results.
  • Determine if the point is inside a polygon. Before MySQL 5.6 the functions that test the spatial relationships between 2 geometries (i.e. find if the given point is within a polygon) only used a Minimum Bounding Rectangle (MBR). This is a major limitation for example #2 above (I will explain it below).

In my old presentation for the 2006 MySQL User Conference I  showed how to calculate distances on Earth in MySQL without using the MySQL spatial functions. In short, one can store the latitude and longitude coordinates directly in MySQL fields (decimal) and use a haversine formula to calculate distance.

New MySQL Spatial Functions in 5.6

The good news is:

1) MySQL 5.6 adds a set of new functions (some of them are not 100% documented though) that use the object shapes rather than the MBR to calculate spatial relationships. Those new functions begin with “ST_”, i.e.

  • contains(g1, g2)  uses MBR only (not exact!)
  • st_contains(g1, g2) uses exact shapes

2) MySQL 5.6 implements st_distance(g1, g2) function that calculates the distance between 2 geometries, which is currently not documented (I’ve filed the feature request to document the st_distance function in MySQL)

The bad news is:

1) All functions still only use the planar system coordinates. Different SRIDs are not supported.

2) Spatial indexes (RTREE) are only supported for MyISAM tables. One can use the functions for InnoDB tables, but it will not use spatial keys.

Example of MySQL’s MBR “false positives”

To illustrate why we do not want to use MBR-based functions for geospatial search, I’ve generated 2 polygons that represent 2 zip code boundaries in San Francisco, CA and placed it on Google Maps.

The blue rectangle represents the Minimum Bounding Rectangle of Zip code “91102” (I’ve used envelope() mysql function to obtain coordinates for the MBR). As we can see it covers both zip code 94103 and 94102. In this case if we have coordinates of a building in the city’s “south of market” district (ZIP 91103) and try to find a zip code it belongs to using the “contains()” function we will have a “false positives”:

mbr_example_sm

In this particular example we got 3 zip codes as the MBR of 94158 also overlaps this area. Another point in “south of market” can actually produce 4 different zip codes. However, in MySQL 5.6 we can use the new st_contains function:

As we can see st_contains() produces the correct results.

Find a ZIP code for the given location

Starting with MySQL 5.6 one can use the MySQL spatial functions st_contains or st_within to find if the given point is inside the given polygon. In our scenario, we will need to find the zip code for the given latitude and longitude. To do that in MySQL we can perform the following steps:

  1. Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts). The data will be stored as a MySQL Geometry object, to convert it to text we can use astext(geom) function.
  2. Use the st_contains() or st_within() functions:

    or

Spatial Index for “ST_” functions

MyISAM tables support Spatial indexes, so the above queries will use those indexes. Example:

As we can see our spatial index is used for those functions. If we ignore or remove the index, the query will run significantly slower:

The InnoDB engine does not support spatial indexes, so those queries will be slow. As zip boundaries do not change often we can potentially use MyISAM tables for them.

Find all coffee shops in a 10-mile radius

MySQL 5.6 supports st_distance functions with 2 drawbacks:

  1. It only supports planar coordinates
  2. It does not use index

Given those major limitations, it is not very easy to use st_distance function for the geo enabled applications. If we simply need to find a distance between 2 points it is easier to store lat, lon directly and use harvesine expression (as described above).

However, it is still possible to use the st_distance() if we do not need exact numbers for the distance between 2 points (i.e. we only need to sort by distance). In our example, to find all the coffee shops we will need to:

  1. Get the 10 mile radius MBR and use “within()” or “st_within()” function
  2. Use st_distance function in the order by clause

First, we will calculate an envelope (square) to include approximately 10 miles, using the following approximations:

  • 1 degree of latitude ~= 69 miles
  • 1 degree of longitude ~= cos(latitude)*69 miles

@lat and @lon in this example are the coordinates for the San Francisco International Airport (SFO).

This will give us a set of coordinates (points) for the lower left and upper right corner of our square. Then we can use a MySQL’s envelope function to generate the MBR (we use linestring to draw a line between the 2 generated points and then envelope to draw a square):

The “envelope” will look like this:

envelope_example

This is not exactly a 10-mile radius, however, it may be close enough. Now we can find all points around SFO airport and sort by distance.

As we can see from the explain it will use the spatial key on SHAPE and will only scan 430 rows, rather than millions of POIs.

The query does not show the exact distance (this may be ok if we only need to output the points on the map).  If we need to show the distance we can use the harvesine formula to calculate that. For example, we can create the following stored function to implement the calculations:

And then use it for both order by and to displaying the distance. This query will also filter by “coffee”:

Conclusion

MySQL 5.6 implements an additional set of functions that can help create geo-enabled applications with MySQL. Storing polygons boundaries (ZIP code boundaries for example) is efficient and the new spatial functions (st_within, st_contains, etc) will produce correct results and will use spatial (rtree) indexes (for MyISAM tables only). The OpenGIS standard is very common and it is easy to obtain the data in this format or use the standard application which can “talk” this language.

Unfortunately, st_distance function is not very useful for calculating distance between 2 points on Earth and it does not use an index. In this case, it is still more feasible to calculate distances manually using the harvesine formula. Hopefully, this will be fixed in the next mysql release.

There are also some other limitations, for example, st_union() function only supports 2 arguments and does not support an array, so it can’t be used in queries like “select st_union(geom) from zipcodes group by state”.

Links

And finally, let me know in the comments how you use MySQL for geo-enabled applications. In my next post, I will talk more about the basics of the MySQL geospatial extension as well as Sphinx Search‘s implementation of the Geospatial functions.

40 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Matt Brooks

Hi Alex,

Thank you for this post. I am having this problem above as I would like to use your query concept above but on a database of 800,000 records. Do you think it will perform well?

I posted my query here. Any help would be stellar!!!

http://stackoverflow.com/questions/20393104/mysql-spatial-indexing-with-multiple-left-join-on-large-database

Matt Brooks

Hi Alex,

Thank you for the help. Much appreciated. Please contact me as we are interested in working with you to help optimize some of our geo location tables for our directory software platform.

Ollie Jones

Awesome write up, thanks!
Nitpick, it’s haversine, not harvesine.

Aleksander Brancewicz

Hu Alex,

I’ve followed your brilliant guide to create a db which contains poi. Unfortunately once I’m explaining the within distance query I see that if there are more then 4 items explain shows ALL as a type of a query. Could you please spell it out for my why is that and how should I overcome it?
My table is:
create table marketing_action
(id int primary key auto_increment, times_seen mediumint,likes mediumint, ranking smallint,geo_position geometry not null, title varchar(35), descrption varchar(60) )
default character set utf8 default collate utf8_general_ci
engine = myisam;
alter table marketing_action add spatial index geo_position_spatial_index (geo_position);

and query which I use is:
select harvesine(y(geo_position),x(geo_position),@lat,@lon) as dist ,X(geo_position),Y(geo_position), title from marketing_action
where st_within(geo_position, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))));

the data I populate is:
truncate table marketing_action;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 30, geo_position = GeomFromText(‘POINT (4.82677 52.27516)’),title = ‘Nieuwe Schoenen bij Yanaike en Ron’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (4.82677 52.26516)’),title = ‘Menu van de dag bij Smerige Bloem restaurant’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 38, geo_position = GeomFromText(‘POINT (4.82677 52.22345)’),title = ‘Super korting bij Wijnen van Jordy’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (5.01252 52.30230)’),title = ‘OP=OP bij Yofkes Niks’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (5.00235 52.27000)’),title = ‘OP=OP bij kruiden’;
insert into marketing_action set times_seen = 10,likes = 3, ranking = 15, geo_position = GeomFromText(‘POINT (4.82677 52.20123)’),title = ‘Een concert bij het restaurant happje snappje’;

As I said once I remove 2 of random items MySQL starts using spatial index.

Thanks up front for your answer!!

Val Vinder

Hi Alex,

very informative article, thanks

You’ve mentioned that you might be able to cover in more details in the future:

“Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts)”

is there any way you can publish a quick walkthrough or perhaps a link reference to the process

Thanks a lot

Sakib

Thanks for your great post. I have following scenario:

The following query takes 22 sec in table. I wanna mention than in my business table I have 10+ million data. Also, by doing a count(name) I found it got 1992312 data (~2 million) . Is it expected or Im doing anything wrong? Im using mysql 5.6.16 and I do have a spatial index on spatial_location field.

Query:
set @lon = -74.00597309999999;
set @lat= 40.7143528;
set @dist = 200;

set @rlon1 = @lon-@dist/abs(cos(radians(@lat))*69);
set @rlon2 = @lon+@dist/abs(cos(radians(@lat))*69);
set @rlat1 = @lat-(@dist/69);
set @rlat2 = @lat+(@dist/69);

select name from business
where st_within(spatial_location, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by st_distance(point(@lon, @lat), spatial_location)
limit 10;

Sakib

This is the result of explain:

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE business range spatial_location spatial_location 34 NULL 941211 “Using where”

Aleksander Brancewicz

Hi Alex,

That’s indeed a solution with more rows the query starts to be executed with an index. Thanks!

Sakib

Hello Alex,
Thanks for quick reply. So, it looks like its a performance issue of st_within() method. Prob is that I cant keep a smaller value for @dist as for rural area the amount of data would be less. I tried the same query in NoSQL database (MongoDB) and for the same query the response time is 0.3 second. But, NoSQL has poor fulltext search functionality which is better in MySQL. So, looks like Im kind of stuck in middle of MySQL and NoSQL 🙁

Sakib

Ya thats is what I was thinking too. But the main problem is to create an algorithm to get the desired number of data with as less increment step as possible. Lets think we need to fetch 1000 data. and with @dist =10, we get 150 data. So, on the next step, we should increment by 1000/150 = 6.66 ~ 7 times. Means, next query will run with @dist = (10 * 7) =70.

I know this may not be a perfect solution but what do you think? Can you suggest me any better idea?

Sakib

Hello Alex,
The app is a directory listing site and I need to show all the businesses within a radius around a given point. So, unfortunate I cant use any additional filter here. I will create a stored procedure and go for the incremental process that we discuss here. Thanks for helping me out 🙂

Rahul Kumar Mehta

Hi Alexander Rubin,

I need to get distance of a ponit (x, y) from a polygon.
Do we have any function in mysql to calculate the same??

Regards,
Rahul

Bowo

It seems like the Polygon-in-Polygon-test does not work most of the time.

SELECT ST_CONTAINS(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,2 2,1 1,0 1))’)) as ‘ST_CONTAINS’;
+————-+
| ST_CON|
+————-+
| 0 |
+————-+

SELECT ST_CONTAINS(GeomFromText(‘POLYGON((0 0,0 1,0 2,1 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,0 2,1 2,1 1,0 1))’)) as ‘ST_CONTAINS’;
+————-+
| ST_CON |
+————-+
| 0 |
+————-+

Bowo

But It gives true answer while using MBRCONTAINS

SELECT MBRCONTAINS(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,2 2,1 1,0 1))’)) as ‘MBRCON’;
+————-+
| MBRCON|
+————-+
| 1 |
+————-+

SELECT MBRCONTAINS(GeomFromText(‘POLYGON((0 0,0 1,0 2,1 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,0 2,1 2,1 1,0 1))’)) as ‘MBRCON’;
+————-+
| MBRCON |
+————-+
| 1 |
+————-+

SELECT MBRContains(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((3 3,3 4,4 4,3 3))’)) as ‘MBRCON’;
+————-+
| MBRCON |
+————-+
| 0 |
+————-+

dexxtr

you can also try my variant. that is how to determine point or points inside circle

>>> SELECT * FROM locator WHERE SQRT(POW(X(center) – 49.843317 , 2) + POW(Y(center) – 24.026642, 2)) * 100 < radius

http://dexxtr.com/post/83498801191/how-to-determine-point-inside-circle-using-mysql

Hope this helps.

Sebastian

Hello Alexander,

I would like to combine a spatial index with a Btree index on a varchar. For example I would like to know where all airports are within a certain region, and to speed up the query I don’t want to consider rows in my table that represent coffeeshops.

So in addition to the spatial index I also want to provide a type of the place, is that possible? And could you give me some hints on where I could find the information to do so?

My question is also here:
http://stackoverflow.com/questions/24088337/how-can-i-combine-a-btree-and-geograhpical-or-spatial-index-to-speed-up-my-query

Thanks a lot in advance.

Sebastian

Sebastian

Hi Alexander,

Thanks for your help and quick reply. I tried option b) the approach using a multicolumn BTREE index, however it appeared to be twice as slow as my original approach using the spatial index. Therefore I will stick with the spatial index for now and split up my table for different types of places, which I think is the best solution in terms of performance.

dedy

hi mr.alex, my english not good but i need ur help

i will to show polyline to leaflet map from mysql database, my coordinate type is: linestring,
values ex: -7.98478 112.63213,-7.90000 112.60000 (two point).

this leaflet function for showing polyline to map
L.polyline([[-7.94432, 112.64917],[-7.94993, 112.66085],[-7.95401, 112.69947]]).addTo(map);

what query can use?

Moma

Hello Alexander,

Thank you for the great article.

I have a problem when I am inserting simple spatial data into mysql. Instead of actual data I am getting some weird symbols in the rows. These are the queries that I am running, directly taken from the mysql documentation:

CREATE TABLE geom (g GEOMETRY);
INSERT INTO geom VALUES (GeomFromText(‘POINT(1 1)’));

This is the result of the query:
‘�?�?@@’
‘�?�?’
‘�?�?’
‘�?�?’

Do you have any idea why is this happening?

Thank you a lot in advance!

Tibin V

Hello Alex,

Thank you for the great article.
How can I fetch polygons in parent child manner
http://stackoverflow.com/questions/25115565/how-to-design-a-databases-table-to-store-mysql-spatial-data-in-a-parent-child-r

Srithar

Hello sir,,, i dont know waypoints field….plz create table with starting from end,,,i want to find points of interest using mysql spatial queries,,,plz help me

Daniel

Hi Alex,
thank you for this great post! It looks like InnoDB supports spatial indexes now with MySQL 5.7.5:
http://dev.mysql.com/doc/refman/5.7/en/creating-spatial-indexes.html
“For MyISAM and (as of MySQL 5.7.5) InnoDB tables, MySQL can create spatial indexes using syntax similar to that for creating regular indexes, but using the SPATIAL keyword.”

David Guy

Hi Alex
First of all I must admit that I am a ‘Silver Surfer’ and am not a coder as such, so if the answer to my question is positive I will need to find some kind so soul to help me with the coding!
My question is based on the known positions of aircraft.
I collect position data from aircraft using an ADSB receiver and I would like to know if it is possible to work out if they are within a set distance (10 miles) from a known point (an airfield)
Please be gentle with me
Thanks and Regards
Dave

Anil Jangra

i am using following query to get distance from given lat lng and sorted.
but i am getting empty resultset even when i have a value with same lat lng (29.39,81.31) in my db.where am i wrong

Query is ..

set @lat= 29.39;
set @lon = 81.31;
set @dist = 10;
set @rlon1 = @lon-@dist/abs(cos(radians(@lat))*69);
set @rlon2 = @lon+@dist/abs(cos(radians(@lat))*69);
set @rlat1 = @lat-(@dist/69);
set @rlat2 = @lat+(@dist/69);
select harvesine(x(pt), y(pt), @lat, @lon ) as dist, id from position
where st_within(pt, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by dist limit 10;

Structure is http://postimg.org/image/o90p7ycpf/

scott taggart

Thanks for this post – it was exactly what I needed on a project I am working on. Good stuff and greatly appreciated.

Ranjit

Alexander Rubin,

I need to select records which present around 30 miles circle from current location, I used geohash and also long and lat. I have millions of record in cluster. Which will be the best option to query data with high performance.

helseb

Hi everyone,

I would like to draw a 500m buffer around a GPS point with mySQL.
I done the following query :

set @lat= 48.856614;
set @lon = 2.352222;
set @dist = 500;
select st_astext(st_buffer(point(@lat,@lon), @dist));

Unfortunately, it seems that all points returned by MySQL are not GPS point…

What am I doing wrong ?

Fabio H de O Rangel

Dear Alex good Night, see if you can help me, can you check what is wrong in my code above, because is not working.

my column geom -22.91602,-41.97678/-22.94322,-42.11171/-23.02634,-41.99945/
my query $sql = “select * from cerca_gps where ST_Contains(geon, point(-22.96850902, -42.0272541));”;

giusepped

Spatial is supported with TokuDB?

Code Junkie

Hi Alex!
Can you please share the “waypoints” table structure?