July 31, 2014

Using the new spatial functions in MySQL 5.6 for geo-enabled applications

Geo-enabled (or location enabled) applications are very common nowadays and many of them use MySQL. The common tasks for such applications are:

  • Find all points of interests (i.e. coffee shops) around (i.e. a 10 mile radius) the given location (latitude and longitude). For example we want to show this to a user of the mobile application when we know his/her approximate location. (This usually means we need to calculate a distance between 2 points on Earth).
  • Find a ZIP code (U.S. Postal address) for the given location or determine if this location is within the given area. Another example is to find a school district for the given property.

MySQL had the spatial functions originally (implementation follows a subset of OpenGIS standard). However, there are 2 major limitation of MySQL spatial functions that can make it difficult to use those functions in geo-enabled applications:

  • Distance between 2 points.  The “distance” function was not implemented before MySQL 5.6. In addition (even in MySQL 5.6), all calculations (e.g. distance between 2 points) are done using a planar coordinate system (Euclidean geometry). For the distance between 2 points on Earth this can produce incorrect results.
  • Determine if the point is inside a polygon. Before MySQL 5.6 the functions that test the spatial relationships between 2 geometries (i.e. find if the given point is within a polygon) only used a Minimum Bounding Rectangle (MBR). This is a major limitation for example #2 above (I will explain it below).

In my old presentation for the 2006 MySQL User Conference I  showed how to calculate distances on Earth in MySQL without using the MySQL spatial functions. In short, one can store the latitude and longitude coordinates directly in MySQL fields (decimal) and use a haversine  formula to calculate distance.

New MySQL 5.6 Geo Spatial Functions 

The good news is:

1) MySQL 5.6 adds a set of new functions (some of them are not 100% documented though) that use the object shapes rather than the MBR to calculate spatial relationships. Those new functions begins with “ST_”, i.e.

  • contains(g1, g2)  uses MBR only (not exact!)
  • st_contains(g1, g2) uses exact shapes

2) MySQL 5.6 implements st_distance(g1, g2) function that calculates the distance between 2 geometries, which is currently not documented (I’ve filed the feature request to document the st_distance function in MySQL)

The bad news is:

1) All functions still only use the planar system coordinates. Different SRIDs are not supported.

2) Spatial indexes (RTREE) are only supported for MyISAM tables. One can use the functions for InnoDB tables, but it will not use spatial keys.

Example of MySQL’s MBR “false positives”

To illustrate why we do not want to use MBR-based functions for geospatial search, I’ve generated 2 polygons that represent 2 zip code boundaries in San Francisco, CA and placed it on Google Maps.

The blue rectangle represents the Minimum Bounding Rectangle of Zip code “91102″ (I’ve used envelope() mysql function to obtain coordinates for the MBR). As we can see it covers both zip code 94103 and 94102. In this case if we have coordinates of a building in the city’s “south of market” district (ZIP 91103) and try to find a zip code it belongs to using the “contains()” function we will have a “false positives”:

mbr_example_sm

In this particular example we got 3 zip codes as the MBR of 94158 also overlaps this area. Another point in “south of market” can actually produce 4 different zip codes. However, in MySQL 5.6 we can use the new st_contains function:

As we can see st_contains() produces the correct results.

Find a ZIP code for the given location

Starting with MySQL 5.6 one can use the MySQL spatial functions st_contains or st_within to find if the given point is inside the given polygon. In our scenario we will need to find the zip code for the given latitude and longitude. To do that in MySQL we can perform the following steps:

  1. Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts). The data will be stored as MySQL Geometry object, to convert it to text we can use astext(geom) function.
  2. Use the st_contains() or st_within() functions:

    or

Spatial Index for “ST_” functions

MyISAM tables support Spatial indexes, so the above queries will use those indexes. Example:

As we can see our spatial index is used for those functions. If we ignore or remove the index, the query will run significantly slower:

The InnoDB engine does not support spatial indexes, so those queries will be slow. As zip boundaries does not change often we can potentially use MyISAM tables for them.

Find all coffee shops in a 10-mile radius

MySQL 5.6 supports st_distance functions with 2 drawbacks:

  1. It only supports planar coordinates
  2. It does not use index

Given those major limitations, it is not very easy to use st_distance function for the geo enabled applications. If we simply need to find a distance between 2 points it is easier to store lat, lon directly and use harvesine expression (as described above).

However it is still possible to use the st_distance() if we do not need exact numbers for the distance between 2 points (i.e. we only need to sort by distance). In our example, to find all coffee shops we will need to:

  1. Get the 10 mile radius MBR and use “within()” or “st_within()” function
  2. Use st_distance function in the order by clause

First, we will calculate an envelope (square) to include approximately 10 miles, using the following approximations:

  • 1 degree of latitude ~= 69 miles
  • 1 degree of longitude ~= cos(latitude)*69 miles

@lat and @lon in this example are the coordinates for the San Francisco International Airport (SFO).

This will give us a set of coordinates (points) for the lower left and upper right corner of our square. Then we can use a MySQL’s envelope function to generate the MBR (we use linestring to draw a line between the 2 generated points and then envelope to draw an square):

The “envelope” will look like this:

envelope_example

This is not exactly a 10-mile radius, however it may be close enough. Now we can find all points around SFO airport and sort by distance.

As we can see from the explain it will use the spatial key on SHAPE and will only scan 430 rows, rather than millions of POIs.

The query does not show the exact distance (this may be ok if we only need to output the points on the map).  If we need to show the distance we can use the harvesine formula to calculate that. For example we can create the following stored function to implement the calculations:

And then use it for both order by and to displaying the distance. This query will also filter by “coffee”:

Conclusion

MySQL 5.6 implements an additional set of functions that can help create geo-enabled applications with MySQL. Storing polygons boundaries (ZIP code boundaries for example) is efficient and the new spatial functions (st_within, st_contains, etc) will produce correct results and will use spatial (rtree) indexes (for MyISAM tables only). The OpenGIS standard is very common and it is easy to obtain the data in this format or use the standard application which can “talk” this language.

Unfortunately, st_distance function is not very usable for calculating distance between 2 points on Earth and it does not use an index. In this case it is still more feasible to calculate distances manually using the harvesine formula. Hopefully this will be fixed in the next mysql release.

There are also some other limitations, for example st_union() function only supports 2 arguments and does not support an array, so it can’t be used in a queries like “select st_union(geom) from zipcodes group by state”.

Links

And finally, let me know in the comments how you use MySQL for geo enabled applications. In my next post I will talk more about basics of the MySQL geo spatial extension as well as Sphinx Search‘s implementation of the Geospatial functions.

About Alexander Rubin

Alexander joined Percona in 2013. Alexander worked with MySQL since 2000 as DBA and Application Developer. Before joining Percona he was doing MySQL consulting as a principal consultant for over 7 years (started with MySQL AB in 2006, then Sun Microsystems and then Oracle). He helped many customers design large, scalable and highly available MySQL systems and optimize MySQL performance. Alexander also helped customers design Big Data stores with Apache Hadoop and related technologies.

Comments

  1. Hi Alex,

    Thank you for this post. I am having this problem above as I would like to use your query concept above but on a database of 800,000 records. Do you think it will perform well?

    I posted my query here. Any help would be stellar!!!

    http://stackoverflow.com/questions/20393104/mysql-spatial-indexing-with-multiple-left-join-on-large-database

  2. Hi Matt,

    Here is what you need:
    1. MySQL 5.6
    2. MyISAM table with the index on field you will use MBRContains() or st_within() functions.
    3. Where condition which will filter out only needed rows.

    For example:
    select astext(shape), name from waypoints
    where
    order by st_distance(point(@lon, @lat), shape) limit 10;

    Will scan ALL rows (800K rows in your case).

    And this query:
    select astext(shape), name from waypoints
    where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
    order by st_distance(point(@lon, @lat), shape) limit 10;

    Will use an index on the shape field and only scan a number of rows.
    So, to make it work firts you will need some filtering in the where clause that will use an index, the distance/st_distance does not use a spatial index
    Please let me know if this help.

  3. Hi Alex,

    Thank you for the help. Much appreciated. Please contact me as we are interested in working with you to help optimize some of our geo location tables for our directory software platform.

  4. Hi Matt,

    I will be happy to work with you using Percona Consulting offering (http://www.percona.com/products/mysql-consulting/overview). You can also contact me by email or my linkedin (http://www.linkedin.com/in/alexanderrubin).
    Meanwhile please feel free to post any questions here.

  5. Ollie Jones says:

    Awesome write up, thanks!
    Nitpick, it’s haversine, not harvesine.

  6. Aleksander Brancewicz says:

    Hu Alex,

    I’ve followed your brilliant guide to create a db which contains poi. Unfortunately once I’m explaining the within distance query I see that if there are more then 4 items explain shows ALL as a type of a query. Could you please spell it out for my why is that and how should I overcome it?
    My table is:
    create table marketing_action
    (id int primary key auto_increment, times_seen mediumint,likes mediumint, ranking smallint,geo_position geometry not null, title varchar(35), descrption varchar(60) )
    default character set utf8 default collate utf8_general_ci
    engine = myisam;
    alter table marketing_action add spatial index geo_position_spatial_index (geo_position);

    and query which I use is:
    select harvesine(y(geo_position),x(geo_position),@lat,@lon) as dist ,X(geo_position),Y(geo_position), title from marketing_action
    where st_within(geo_position, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))));

    the data I populate is:
    truncate table marketing_action;
    insert into marketing_action set times_seen = 10,likes = 3, ranking = 30, geo_position = GeomFromText(‘POINT (4.82677 52.27516)’),title = ‘Nieuwe Schoenen bij Yanaike en Ron’;
    insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (4.82677 52.26516)’),title = ‘Menu van de dag bij Smerige Bloem restaurant’;
    insert into marketing_action set times_seen = 10,likes = 3, ranking = 38, geo_position = GeomFromText(‘POINT (4.82677 52.22345)’),title = ‘Super korting bij Wijnen van Jordy’;
    insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (5.01252 52.30230)’),title = ‘OP=OP bij Yofkes Niks’;
    insert into marketing_action set times_seen = 10,likes = 3, ranking = 22, geo_position = GeomFromText(‘POINT (5.00235 52.27000)’),title = ‘OP=OP bij kruiden’;
    insert into marketing_action set times_seen = 10,likes = 3, ranking = 15, geo_position = GeomFromText(‘POINT (4.82677 52.20123)’),title = ‘Een concert bij het restaurant happje snappje’;

    As I said once I remove 2 of random items MySQL starts using spatial index.

    Thanks up front for your answer!!

  7. Val Vinder says:

    Hi Alex,

    very informative article, thanks

    You’ve mentioned that you might be able to cover in more details in the future:

    “Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts)”

    is there any way you can publish a quick walkthrough or perhaps a link reference to the process

    Thanks a lot

  8. Val, just published the followup post with the description, please let me know if you will have any questions. See here: http://www.mysqlperformanceblog.com/2014/03/24/creating-geo-enabled-applications-mysql-5-6/

  9. Aleksander, sorry I’ve missed you comment. I believe this just too small amount of rows and MySQL will prefer to use full table scan instead of index. Have you tried with the larger data amount?

  10. Sakib says:

    Thanks for your great post. I have following scenario:

    The following query takes 22 sec in table. I wanna mention than in my business table I have 10+ million data. Also, by doing a count(name) I found it got 1992312 data (~2 million) . Is it expected or Im doing anything wrong? Im using mysql 5.6.16 and I do have a spatial index on spatial_location field.

    Query:
    set @lon = -74.00597309999999;
    set @lat= 40.7143528;
    set @dist = 200;

    set @rlon1 = @lon-@dist/abs(cos(radians(@lat))*69);
    set @rlon2 = @lon+@dist/abs(cos(radians(@lat))*69);
    set @rlat1 = @lat-(@dist/69);
    set @rlat2 = @lat+(@dist/69);

    select name from business
    where st_within(spatial_location, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
    order by st_distance(point(@lon, @lat), spatial_location)
    limit 10;

  11. Sakib says:

    This is the result of explain:

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE business range spatial_location spatial_location 34 NULL 941211 “Using where”

  12. Aleksander Brancewicz says:

    Hi Alex,

    That’s indeed a solution with more rows the query starts to be executed with an index. Thanks!

  13. Sakib,

    Yes, you are right, in this example if you have a lots of rows in the business table it can have to scan too many rows. For 200 miles it may scan a lot.

    For example, here is my test for Openstreetmaps data:
    mysql> select name from points where st_within(SHAPE, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2)))) order by st_distance(point(@lon, @lat), SHAPE) limit 10;
    +—————————-+
    | name |
    +—————————-+
    | Techno Tourist |
    | Sun Building |
    | Broadway Chambers Building |
    | NULL |
    | NULL |
    | Blue Spoon |
    | NULL |
    | City Hall (R) |
    | NULL |
    | New York Ladder Company 1 |
    +—————————-+
    10 rows in set (17.59 sec)

    Explain:
    id: 1
    select_type: SIMPLE
    table: points
    type: range
    possible_keys: SHAPE
    key: SHAPE
    key_len: 34
    ref: NULL
    rows: 522409
    Extra: Using where; Using filesort
    1 row in set (0.01 sec)

    In this case it will scan (estimated) 500K rows and will only need 10. If you have large amount of rows in the table I would suggest:
    1. Use shorter distance. Example:
    set @lon = -74.00597309999999;
    set @lat= 40.7143528;
    set @dist = 10;
    id: 1
    select_type: SIMPLE
    table: points
    type: range
    possible_keys: SHAPE
    key: SHAPE
    key_len: 34
    ref: NULL
    rows: 492
    Extra: Using where; Using filesort
    1 row in set (0.00 sec)

    Result:
    10 rows in set (0.39 sec)

    2. Use other filters in the where clause and filter by restaurants only for example.

  14. Sakib says:

    Hello Alex,
    Thanks for quick reply. So, it looks like its a performance issue of st_within() method. Prob is that I cant keep a smaller value for @dist as for rural area the amount of data would be less. I tried the same query in NoSQL database (MongoDB) and for the same query the response time is 0.3 second. But, NoSQL has poor fulltext search functionality which is better in MySQL. So, looks like Im kind of stuck in middle of MySQL and NoSQL :(

  15. Sakib,

    One potential way of fixing this problem will do the “incremental” approach, first try with 10 miles then with 25 miles, etc and stop when you will get 10 results back.

    Another way is to try SphinxSearch, it will support both fulltext search and geodistance function/

  16. Sakib says:

    Ya thats is what I was thinking too. But the main problem is to create an algorithm to get the desired number of data with as less increment step as possible. Lets think we need to fetch 1000 data. and with @dist =10, we get 150 data. So, on the next step, we should increment by 1000/150 = 6.66 ~ 7 times. Means, next query will run with @dist = (10 * 7) =70.

    I know this may not be a perfect solution but what do you think? Can you suggest me any better idea?

  17. Sakib,

    Looks ok. Another option will be to filter by type of business for example: “where st_within(…) and type=’restaurant’” (but it will not be a combined index) or “partition” the table by the type of business.

  18. Sakib says:

    Hello Alex,
    The app is a directory listing site and I need to show all the businesses within a radius around a given point. So, unfortunate I cant use any additional filter here. I will create a stored procedure and go for the incremental process that we discuss here. Thanks for helping me out :)

  19. Rahul Kumar Mehta says:

    Hi Alexander Rubin,

    I need to get distance of a ponit (x, y) from a polygon.
    Do we have any function in mysql to calculate the same??

    Regards,
    Rahul

  20. Bowo says:

    It seems like the Polygon-in-Polygon-test does not work most of the time.

    SELECT ST_CONTAINS(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,2 2,1 1,0 1))’)) as ‘ST_CONTAINS’;
    +————-+
    | ST_CON|
    +————-+
    | 0 |
    +————-+

    SELECT ST_CONTAINS(GeomFromText(‘POLYGON((0 0,0 1,0 2,1 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,0 2,1 2,1 1,0 1))’)) as ‘ST_CONTAINS’;
    +————-+
    | ST_CON |
    +————-+
    | 0 |
    +————-+

  21. Bowo says:

    But It gives true answer while using MBRCONTAINS

    SELECT MBRCONTAINS(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,2 2,1 1,0 1))’)) as ‘MBRCON’;
    +————-+
    | MBRCON|
    +————-+
    | 1 |
    +————-+

    SELECT MBRCONTAINS(GeomFromText(‘POLYGON((0 0,0 1,0 2,1 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((0 1,0 2,1 2,1 1,0 1))’)) as ‘MBRCON’;
    +————-+
    | MBRCON |
    +————-+
    | 1 |
    +————-+

    SELECT MBRContains(GeomFromText(‘POLYGON((0 0,0 1,2 2,1 1,1 0,0 0))’),GeomFromText(‘POLYGON((3 3,3 4,4 4,3 3))’)) as ‘MBRCON’;
    +————-+
    | MBRCON |
    +————-+
    | 0 |
    +————-+

  22. dexxtr says:

    you can also try my variant. that is how to determine point or points inside circle

    >>> SELECT * FROM locator WHERE SQRT(POW(X(center) – 49.843317 , 2) + POW(Y(center) – 24.026642, 2)) * 100 < radius

    http://dexxtr.com/post/83498801191/how-to-determine-point-inside-circle-using-mysql

    Hope this helps.

  23. Sebastian says:

    Hello Alexander,

    I would like to combine a spatial index with a Btree index on a varchar. For example I would like to know where all airports are within a certain region, and to speed up the query I don’t want to consider rows in my table that represent coffeeshops.

    So in addition to the spatial index I also want to provide a type of the place, is that possible? And could you give me some hints on where I could find the information to do so?

    My question is also here:
    http://stackoverflow.com/questions/24088337/how-can-i-combine-a-btree-and-geograhpical-or-spatial-index-to-speed-up-my-query

    Thanks a lot in advance.

    Sebastian

  24. Sebastian,

    Unfortunately, it does not work – MySQL will try to choose the best index, based on index statistics, either spatial or btree. What you can do thou is completely avoid using spatial index, calculate the MBR manually.
    Here are 2 way to do it.
    a) Create a table with airports only
    b)
    1. Add lat, lon declared as double (not a point) and populate it
    2. Calculate MBR in the code approximately
    3. Use the query
    Select …
    where point_type = ‘airport’ and lat between X1 and Y1 and lon between X2 and Y2
    order by haversine_distance() … desc limit 10;

    Where X1, Y1, X2, Y2 are constant that you have calculated manually to “draw” a rectangle, similar to contains does.

    Please let me know of this make sense

  25. Rahul Kumar Mehta,

    st_distance can calculate the distance between point and polygon. You can also use centroid() function to get the center of the polygon.

  26. Sebastian says:

    Hi Alexander,

    Thanks for your help and quick reply. I tried option b) the approach using a multicolumn BTREE index, however it appeared to be twice as slow as my original approach using the spatial index. Therefore I will stick with the spatial index for now and split up my table for different types of places, which I think is the best solution in terms of performance.

  27. dedy says:

    hi mr.alex, my english not good but i need ur help

    i will to show polyline to leaflet map from mysql database, my coordinate type is: linestring,
    values ex: -7.98478 112.63213,-7.90000 112.60000 (two point).

    this leaflet function for showing polyline to map
    L.polyline([[-7.94432, 112.64917],[-7.94993, 112.66085],[-7.95401, 112.69947]]).addTo(map);

    what query can use?

Speak Your Mind

*