From: Regina Obe Date: Tue, 15 Feb 2011 13:49:06 +0000 (+0000) Subject: Make work on PostgreSQL 8.4 (was using some syntax only allowed in 9.0+). Also accou... X-Git-Tag: 2.0.0alpha1~1977 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=e84a603c34ec6e618bcd6d759fa2d513c7a7e785;p=postgis Make work on PostgreSQL 8.4 (was using some syntax only allowed in 9.0+). Also account for the odd / even side of street rule. And of course that demonstrated there is a bug somewhere (most likely in the geocoder), but got to pull up some maps to see which is right. One of these has the address orientations flipped (odd /even not right). Of course I have to be right :). Boy do I hate when you program the reverse of something and it exposes a bug. Also fix some minor documentation. git-svn-id: http://svn.osgeo.org/postgis/trunk@6823 b70326c6-7e19-0410-871a-916f4a2858ee --- diff --git a/doc/extras_tigergeocoder.xml b/doc/extras_tigergeocoder.xml index 2dd738860..04116f059 100644 --- a/doc/extras_tigergeocoder.xml +++ b/doc/extras_tigergeocoder.xml @@ -6,16 +6,17 @@ A plpgsql based geocoder written to work with the TIGER (Topologically Integrated Geographic Encoding and Referencing system ) / Line and Master Address database export released by the US Census Bureau. In prior versions the TIGER files were released in ASCII format. The older geocoder used to work with that format is in extras/tiger_geocoder/tiger_2006andbefore. - There are three components to the geocoder: the data loader functions, the address normalizer and the address geocoder. The latest version updated to use the TIGER 2010 census data is located in the extras/tiger_geocoder/tiger_2010 folder. + There are four components to the geocoder: the data loader functions, the address normalizer, the address geocoder, and the reverse geocoder. The latest version updated to use the TIGER 2010 census data is located in the extras/tiger_geocoder/tiger_2010 folder. Although it is designed specifically for the US, a lot of the concepts and functions are applicable and can be adapted to work with other country address and road networks. The script builds a schema called tiger to house all the tiger related functions, reusable lookup data such as road type prefixes, suffixes, states, various control tables for managing data load, and skeleton base tables from which all the tiger loaded tables inherit from. Another schema called tiger_data is also created which houses all the census data for each state that the loader downloads from Census site and loads into the database. In the current model, each set of state tables is - prefixed with the state code e.g ma_addr, ca_edges etc with constraints to enforce only that state data. Each of these tables inherits from the base addr, faces, egdes, etc located in the tiger schema. + prefixed with the state code e.g ma_addr, ca_edges etc with constraints to enforce only that state data. Each of these tables inherits from the base addr, faces, edgess, etc located in the tiger schema. All the geocode functions only reference the base tables, so there is no requirement that the data schema be called tiger_data or that data can't be further partitioned into other schemas -- e.g a different schema for each state, as long as all the tables inherit from the tables in the tiger schema. Design: The goal of this project is to build a fully functional geocoder that can process an arbitrary address string and using normalized TIGER census data, produce a point geometry and rating reflecting the location of the given address and likeliness of the location. + Introduced in PostGIS 2.0.0 is a reverse geocoder useful for deriving the street address and cross streets of a GPS location. The geocoder should be simple for anyone familiar with PostGIS to install and use, and should be easily installable and usable on all platforms supported by PostGIS. It should be robust enough to function properly despite formatting and spelling errors. It should be extensible enough to be used with future data updates, or alternate data sources with a minimum of coding changes. @@ -396,6 +397,20 @@ UNZIPTOOL=unzip Takes a geometry point in a known spatial ref and returns a record containing an array of possible addresses and an array of cross streets. If include_strnum_range = true, includes the street range in the cross streets. include_strnum_range defaults to false if not passed in. Addresses are sorted according to which road a point is closest to so first address is most likely the right one. + Note: Hmm this function relies on Tiger data. If you have not loaded data covering the region of this point, then hmm you will get a record filled with NULLS. + Returned elements of the record are as follows: + + + intpt is an array of points: These are the center line points on the street closest to the input point. There are as many points as there are addresses. + + + addy is an array of norm_addy (normalized addresses): These are an array of possible addresses that fit the input point. The first one in the array is most likely. + Generally there should be only one, except in the case when a point is at the corner of 2 or 3 streets, or the point is somewhere on the road and not off to the side. + + + street an array of varchar: These are cross streets (or the street) (streets that intersect or are the street the point is projected to be on). + + Availability: 2.0.0 @@ -406,7 +421,8 @@ UNZIPTOOL=unzip Examples Example of a point at the corner of two streets, but closest to one. This is approximate location of MIT: 77 Massachusetts Ave, Cambridge, MA 02139 Note that although we don't have 3 streets, PostgreSQL will just return null for entries above our upper bound so safe to use. This includes street ranges - SELECT pprint_addy(r.addy[1]) As st1, pprint_addy(r.addy[2]) As st2, pprint_addy(r.addy[3]) As st3, array_to_string(r.street, ',') As cross_streets + SELECT pprint_addy(r.addy[1]) As st1, pprint_addy(r.addy[2]) As st2, pprint_addy(r.addy[3]) As st3, + array_to_string(r.street, ',') As cross_streets FROM reverse_geocode(ST_GeomFromText('POINT(-71.093902 42.359446)',4269),true) As r; result @@ -417,27 +433,32 @@ UNZIPTOOL=unzip Here we choose not to include the address ranges for the cross streets and picked a location really really close to a corner of 2 streets thus could be known by two different addresses. -SELECT pprint_addy(r.addy[1]) As st1, pprint_addy(r.addy[2]) As st2, pprint_addy(r.addy[3]) As st3, -array_to_string(r.street, ',') As cross_str +SELECT pprint_addy(r.addy[1]) As st1, pprint_addy(r.addy[2]) As st2, +pprint_addy(r.addy[3]) As st3, array_to_string(r.street, ',') As cross_str FROM reverse_geocode(ST_GeomFromText('POINT(-71.06941 42.34225)',4269)) As r; result -------- st1 | st2 | st3 | cross_str ---------------------------------+---------------------------------+-----+------------------------ - 4 Bradford St, Boston, MA 02118 | 48 Waltham St, Boston, MA 02118 | | Bradford St,Waltham St + 5 Bradford St, Boston, MA 02118 | 49 Waltham St, Boston, MA 02118 | | Bradford St,Waltham St -For this one we reuse our geocoded example from and we only want the primary address and at most 2 cross streets. -SELECT actual_addr, lon, lat, pprint_addy((rg).addy[1]) As int_addr1, (rg).street[1] As cross1, (rg).street[2] As cross2 -FROM (SELECT address As actual_addr, +For this one we reuse our geocoded example from and we only want the primary address and at most 2 cross streets. +TODO: Fix suspected bug in geocode (guessing wrong side of street -- suspect geocode is wrong and reverse_geocode is right by spot check of map). +SELECT actual_addr, lon, lat, pprint_addy((rg).addy[1]) As int_addr1, + (rg).street[1] As cross1, (rg).street[2] As cross2 +FROM (SELECT address As actual_addr, lon, lat, reverse_geocode( ST_SetSRID(ST_Point(lon,lat),4326) ) As rg FROM addresses_to_geocode WHERE rating IS NOT NULL) As foo; - actual_addr | int_addr1 | cross1 | cross2 -----------------------------------------------+-------------------------------------------+-------------------+-------------- - 529 Main Street, Boston MA, 02129 | 538 Main St, Boston, MA 02129 | Main St | - 77 Massachusetts Avenue, Cambridge, MA 02139 | 59 Massachusetts Ave, Cambridge, MA 02139 | Massachusetts Ave | Wellesley St + actual_addr | lon | lat | int_addr1 | cross1 | cross2 +-----------------------------------------------------+-----------+----------+------------------------------+-------------------+----------------- + 529 Main Street, Boston MA, 02129 | -71.07187 | 42.38351 | 538 Main St, Boston, MA 02129| Main St | + 77 Massachusetts Avenue, Cambridge, MA 02139 | -71.09436 | 42.35981 | 60 Massachusetts Ave, Cambr..| Massachusetts Ave | Wellesley St + 28 Capen Street, Medford, MA | -71.12184 | 42.41010 | 29 Capen St, Malden, MA 02155| Capen St Exd | + 124 Mount Auburn St, Cambridge, Massachusetts 02138 | -71.12298 | 42.37336 | 2 University Rd, Belmont, M..| University Rd | Mount Auburn St + 950 Main Street, Worcester, MA 01610 | -71.82361 | 42.24948 | 963 Main St, Worceste.. 01603| Main St | Crystal St diff --git a/extras/tiger_geocoder/tiger_2010/geocode/reverse_geocode.sql b/extras/tiger_geocoder/tiger_2010/geocode/reverse_geocode.sql index e8479be53..720f7691b 100644 --- a/extras/tiger_geocoder/tiger_2010/geocode/reverse_geocode.sql +++ b/extras/tiger_geocoder/tiger_2010/geocode/reverse_geocode.sql @@ -20,6 +20,7 @@ DECLARE var_states text[]; var_addy NORM_ADDY; var_strnum varchar; + var_nstrnum numeric(10); var_primary_line geometry := NULL; var_primary_dist numeric(10,2) ; var_pt geometry; @@ -46,15 +47,15 @@ BEGIN FOR var_redge IN SELECT * FROM (SELECT DISTINCT ON(fullname) foo.fullname, foo.stusps, foo.zip, - (SELECT z.place FROM zip_state_loc AS z WHERE z.zip = foo.zip and z.statefp = foo.statefp LIMIT 1) As place, foo.intpt, + (SELECT z.place FROM zip_state_loc AS z WHERE z.zip = foo.zip and z.statefp = foo.statefp LIMIT 1) As place, foo.center_pt, side, to_number(fromhn, '999999') As fromhn, to_number(tohn, '999999') As tohn, ST_GeometryN(ST_Multi(line),1) As line, foo.dist FROM - (SELECT e.the_geom As line, e.fullname, a.zip, s.stusps, ST_ClosestPoint(e.the_geom, var_pt) As intpt, e.statefp, a.side, a.fromhn, a.tohn, ST_Distance_Sphere(e.the_geom, var_pt) As dist + (SELECT e.the_geom As line, e.fullname, a.zip, s.stusps, ST_ClosestPoint(e.the_geom, var_pt) As center_pt, e.statefp, a.side, a.fromhn, a.tohn, ST_Distance_Sphere(e.the_geom, var_pt) As dist FROM edges AS e INNER JOIN state As s ON (e.statefp = s.statefp AND s.statefp = ANY(var_states) ) INNER JOIN faces As fl ON (e.tfidl = fl.tfid AND e.statefp = fl.statefp) INNER JOIN faces As fr ON (e.tfidr = fr.tfid AND e.statefp = fr.statefp) INNER JOIN addr As a ON ( e.tlid = a.tlid AND e.statefp = a.statefp AND - ( ( ST_Contains(fl.the_geom, var_pt) AND a.side = 'L') OR ( ST_Contains(fr.the_geom, var_pt) AND a.side = 'R' ) ) ) + ( ( ST_Covers(fl.the_geom, var_pt) AND a.side = 'L') OR ( ST_Covers(fr.the_geom, var_pt) AND a.side = 'R' ) ) ) -- INNER JOIN zip_state_loc As z ON (a.statefp = z.statefp AND a.zip = z.zip) /** really slow with this join **/ WHERE e.statefp = ANY(var_states) AND a.statefp = ANY(var_states) AND ST_DWithin(e.the_geom, var_pt, 0.005) ORDER BY ST_Distance_Sphere(e.the_geom, var_pt) LIMIT 4) As foo @@ -66,13 +67,18 @@ BEGIN END IF; -- We only consider other edges as matches if they intersect our primary edge -- that would mean we are at a corner place IF ST_Intersects(var_redge.line, var_primary_line) THEN - intpt := array_append(intpt,var_redge.intpt); + intpt := array_append(intpt,var_redge.center_pt); IF var_redge.fullname IS NOT NULL THEN street := array_append(street, (CASE WHEN include_strnum_range THEN COALESCE(var_redge.fromhn::varchar, '')::varchar || ' - ' || COALESCE(var_redge.tohn::varchar,'')::varchar || ' '::varchar ELSE '' END::varchar || var_redge.fullname::varchar)::varchar); --interploate the number -- note that if fromhn > tohn we will be subtracting which is what we want -- We only consider differential distances are reeally close from our primary pt IF var_redge.dist < var_primary_dist*1.1 THEN - var_strnum := (var_redge.fromhn + ST_Line_Locate_Point(var_redge.line, var_pt)*(var_redge.tohn - var_redge.fromhn))::numeric(10)::varchar; + var_nstrnum := (var_redge.fromhn + ST_Line_Locate_Point(var_redge.line, var_pt)*(var_redge.tohn - var_redge.fromhn))::numeric(10); + -- The odd even street number side of street rule + IF (var_nstrnum % 2) != (var_redge.tohn % 2) THEN + var_nstrnum := CASE WHEN var_nstrnum + 1 NOT BETWEEN var_redge.fromhn AND var_redge.tohn THEN var_nstrnum - 1 ELSE var_nstrnum + 1 END; + END IF; + var_strnum := var_nstrnum::varchar; var_addy := normalize_address( COALESCE(var_strnum::varchar || ' ', '') || var_redge.fullname || ', ' || var_redge.place || ', ' || var_redge.stusps || ' ' || var_redge.zip); addy := array_append(addy, var_addy); END IF; @@ -82,7 +88,7 @@ BEGIN RETURN; END; -$_$ LANGUAGE plpgsql; +$_$ LANGUAGE plpgsql STABLE; CREATE OR REPLACE FUNCTION reverse_geocode(IN pt geometry, OUT intpt geometry[], OUT addy NORM_ADDY[], @@ -92,4 +98,4 @@ $$ -- default to not including street range in cross streets SELECT reverse_geocode($1,false); $$ -language sql; \ No newline at end of file +language sql STABLE; \ No newline at end of file