Tips and Tricks For Writing PostGIS Spatial Queries
Tips and Tricks For Writing PostGIS Spatial Queries
Tips and Tricks For Writing PostGIS Spatial Queries
Writing PostGIS
Spatial Queries
Leo Hsu and Regina Obe
Paragon Corporation http://www.paragoncorporation.com
PostGIS in Action http://www.manning.com/obe (our upcoming book!)
Useful Links:
PostGIS http://postgis.refractions.net
PostGIS Trac and Wiki http://trac.osgeo.org/postgis
Boston GIS http://www.bostongis.com
Postgres On Line Journal http://www.postgresonline.com
New Features in PostGIS 1.4
Faster Aggregates
Cascaded Union (union 40,000 polygons in
seconds instead of in your dreams) (need GEOS
3.1.1 and above)
Prepared Geometries – for improved
ST_Intersects, ST_Within, ST_Contains (need
GEOS 3.1+)
It is out
Speed Test 1: Polygon union
Takes 26 secs
Speed Test 2: Union and Transform
Windowing Functions
Common Table Expressions and Recursive Common
Table Expressions
Unnest, array_agg
More efficient query planner – better results with
COUNT, IN and EXISTS and INTERSECTS and
EXCEPT clauses, improved Hash indexes
Faster database restore
PgMigrator for in place upgrade from 8.3 to 8.4
Tip: Add indexes AFTER bulk insert
Bulk insert
Bulk Insert
WGS 84 –
--yields 1.23567.. Degrees --yields -- 131,103 meters
(what do we do with this?) SELECT a.state As st_a, b.state As st_b,
ST_Distance(a.the_geom, b.the_geom) As
SELECT a.state As st_a, b.state As st_b, dist_m
ST_Distance(a.the_geom, b.the_geom) As FROM us.states AS a
dist_deg CROSS JOIN us.states AS b
FROM us.states_wgs84 AS a WHERE a.state = 'Maine'
CROSS JOIN us.states_wgs84 AS b and b.state = 'Rhode Island';
WHERE a.state = 'Maine'
and b.state = 'Rhode Island';
Tip: Use the graphical explain in PgAdmin
WITH nn AS (
SELECT h.gid AS hyd_id,
h.hyd_name,ROW_NUMBER() OVER(PARTITION BY h.gid
ORDER BY ST_Distance(h.the_geom, b.the_geom)) As
row_num,
b.bldg_name,b.bldg_type,ST_Distance(b.the_geom, h.the_geom) As
dist_to_lake
FROM building As b INNER JOIN
hydrology As h ON (ST_DWithin(h.the_geom, b.the_geom, 50000) )
)
SELECT nn.*
FROM nn
WHERE nn.row_num <= 5
ORDER BY nn.hyd_name, nn.hyd_id, nn.row_num;
WITH nn AS (
SELECT h.gid AS hyd_id,
h.hyd_name,ROW_NUMBER() OVER(PARTITION BY h.gid
ORDER BY ST_Distance(h.the_geom, b.the_geom)) As row_num,
b.bldg_name,b.bldg_type,ST_Distance(b.the_geom, h.the_geom) As dist_to_lake
FROM building As b INNER JOIN
hydrology As h ON (ST_DWithin(h.the_geom, b.the_geom, 50000) )
)
SELECT nn.*
FROM nn
WHERE nn.row_num <= 5
ORDER BY nn.hyd_name, nn.hyd_id, nn.row_num;
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
--
Sort (cost=30.28..30.29 rows=1 width=402) (actual time=1149.990..1149.999 rows=20 loops=1)
Output: nn.hyd_id, nn.hyd_name, nn.row_num, nn.bldg_name, nn.bldg_type, nn.dist_to_lake
Sort Key: nn.hyd_name, nn.hyd_id, nn.row_num
Sort Method: quicksort Memory: 19kB
CTE nn
-> WindowAgg (cost=30.22..30.25 rows=1 width=980) (actual time=773.909..1146.397 rows=1968 loops=1)
Output: h.gid, h.hyd_name, row_number() OVER (?), b.bldg_name, b.bldg_type, st_distance(b.the_geom, h.the_geom)
-> Sort (cost=30.22..30.23 rows=1 width=980) (actual time=773.847..777.443 rows=1968 loops=1)
Output: h.gid, h.hyd_name, b.bldg_name, b.bldg_type, b.the_geom, h.the_geom
Sort Key: h.gid, (st_distance(h.the_geom, b.the_geom))
Sort Method: external merge Disk: 1736kB
-> Nested Loop (cost=0.00..30.21 rows=1 width=980) (actual time=0.149..755.012 rows=1968 loops=1)
Output: h.gid, h.hyd_name, b.bldg_name, b.bldg_type, b.the_geom, h.the_geom
Join Filter: (_st_dwithin(h.the_geom, b.the_geom, 50000::double precision) AND (h.the_geom && st_expand(b.the_geom, 50000::double precision)))
-> Seq Scan on hydrology h (cost=0.00..1.04 rows=4 width=354) (actual time=0.008..0.015 rows=4 loops=1)
Output: h.gid, h.hyd_name, h.hyd_type, h.the_geom
-> Index Scan using assets_building_idx_the_geom on building b (cost=0.00..7.27 rows=1 width=626) (actual time=0.054..0.748 rows=492
loops=4)
Output: b.gid, b.bldg_name, b.bldg_type, b.the_geom
Index Cond: (b.the_geom && st_expand(h.the_geom, 50000::double precision))
-> CTE Scan on nn (cost=0.00..0.02 rows=1 width=402) (actual time=773.920..1149.886 rows=20 loops=1)
Output: nn.hyd_id, nn.hyd_name, nn.row_num, nn.bldg_name, nn.bldg_type, nn.dist_to_lake
Nearest neighbor queries
SELECT nn.*
FROM (
SELECT
h.gid AS hyd_id,
h.hyd_name,
ROW_NUMBER() OVER(PARTITION BY h.gid ORDER BY ST_Distance(h.the_geom,b.the_geom))
As row_num,
b.bldg_name,
b.bldg_type,
ST_Distance(b.the_geom, h.the_geom) As dist_to_lake
FROM building As b INNER JOIN hydrology As h
ON ST_DWithin(h.the_geom, b.the_geom, 50000) As nn
WHERE nn.row_num <= 5
ORDER BY nn.hyd_name, nn.hyd_id, nn.row_num;
Tip: If what is too slow, ask the opposite
question
You know what is not if you can ask for the universe and
what is.
How do you ask what is without losing the universe?
Use a LEFT JOIN instead of an INNER JOIN
SELECT t1.field1, t1.field2 FROM t1 LEFT JOIN t2 ON (the what is condition) WHERE t2.some_non_null_key IS NULL;
Example: What has no close neighbors
The more vertices you have the slower your distance calculation: CA has 10,210 pts and TX has 12,167 pts.
After simplification , CA has 873 pts, TX has 1653 pts.
--doesn't use an index but less costly dwithin check (9,032 ms)
-- but at 2 or more beats the above for this small dataset (limit 2: 9,734 ms)
SELECT a.state As st_a, b.state As st_b
FROM states AS a CROSS JOIN states AS b
WHERE
NOT (a.state = b.state)
AND ST_DWithin(ST_SimplifyPreserveTopology(a.the_geom,700),
ST_SimplifyPreserveTopology(b.the_geom,700),1000)
LIMIT 1;
--uses an index and less costly dwithin check (422 ms, at limit 2: 656 ms)
--If you dared run this across all the states -- (no limit )
-- finishes in 42,687 ms, other 2 you'd be waiting a long time
(note can get faster with even more simplification)
SELECT a.state As st_a, b.state As st_b
FROM states AS a CROSS JOIN states AS b
WHERE
NOT (a.state = b.state)
AND (ST_Expand(a.the_geom,700) && b.the_geom)
AND _ST_DWithin(ST_SimplifyPreserveTopology(a.the_geom,700), ST_SimplifyPreserveTopology(b.the_geom,700),1000)
LIMIT 1;
Compartmentalize common used constructs
an SQL function is transparent to the planner
If your function can benefit from an index, try to make it transparent to the
planner by using SQL – the below still uses an index
CREATE FUNCTION sql_ST_DWithin_Simplify(geom1 geometry, geom2 geometry, dist double precision,
simplify_tolerance double precision)
RETURNS boolean
AS
$$ SELECT ST_Expand($1, $3) && $2 AND ST_Expand($2, $3) && $1
AND _ST_DWithin(ST_SimplifyPreserveTopology($1,$4),ST_SimplifyPreserveTopology($2,$4), $3)
$$
language 'sql' IMMUTABLE;
---uses an index and less costly dwithin (limit 5: 1906 ms, no limit: 42,141 ms)
SELECT a.state As st_a, b.state As st_b
FROM states AS a
CROSS JOIN states AS b
WHERE
NOT (a.state = b.state)
AND sql_ST_DWithin_Simplify(a.the_geom, b.the_geom, 1000,700)
limit 2;
Compartmentalize common used constructs
other functions (e.g. plpgsql) are NOT transparent
to the planner
WITH
usext AS -- Define a CTE to store our base variables (extent and our x,y grid count)
(SELECT ST_SetSRID(CAST(ST_Extent(the_geom) As geometry),2163) As the_geom_ext, 10 as x_gridcnt, 10 as y_gridcnt
FROM states As s
WHERE state = 'Texas'),
grid_dim AS -- Define a CTE to store our grid dimension width and height that uses usext
(SELECT
(ST_XMax(the_geom_ext) - ST_XMin(the_geom_ext))/x_gridcnt As g_width,
ST_XMin(the_geom_ext) As xmin, ST_xmax(the_geom_ext) As xmax,
(ST_YMax(the_geom_ext) - ST_YMin(the_geom_ext))/y_gridcnt As g_height,
ST_YMin(the_geom_ext) As ymin, ST_YMax(the_geom_ext) As ymax
FROM usext),
grid As -- Define CTE to store our grid that uses usext and grid_dim
(SELECT x, y, ST_SetSRID(ST_MakeBox2d(ST_Point(xmin + (x - 1)*g_width, ymin + (y-1)*g_height),
ST_Point(xmin + x*g_width, ymin + y*g_height)), 2163) As grid_geom
FROM
(SELECT generate_series(1,x_gridcnt) FROM usext) As x CROSS JOIN
(SELECT generate_series(1,y_gridcnt) FROM usext) As y CROSS JOIN
grid_dim
)
--Use grid to clip Texas and bulk insert new clipped to a new on-the fly table
SELECT state, state_fips, ST_Intersection(s.the_geom, grid_geom) As newgeom
INTO us.texas_diced_g10
FROM states As s INNER JOIN grid ON s.state = 'Texas' AND ST_Intersects(s.the_geom, grid.grid_geom);
The fast way to register a new geometry and put constraints on it. No need
for AddGeometryColumn if you have PostGIS 1.4
SELECT populate_geometry_columns('us.texas_diced_g10'::regclass);
Automatically adds an entry to geometry_columns table for us.texas_diced_g10 by inspecting our table for
type, dimension, and SRID of geometry columns.
Creates a constraint on the new table column if it can (SRID, geometry type, dimension check constraints)
Tip: Use the knife trick to bisect geometries