From ef52eb68b95bbb989cca3c6a64cbf028e1fa1bd7 Mon Sep 17 00:00:00 2001 From: Regina Obe Date: Mon, 13 Oct 2014 18:07:51 +0000 Subject: [PATCH] get rid of windows \r git-svn-id: http://svn.osgeo.org/postgis/trunk@13063 b70326c6-7e19-0410-871a-916f4a2858ee --- doc/extras_address_standardizer.xml | 848 ++++++++++++++-------------- 1 file changed, 424 insertions(+), 424 deletions(-) diff --git a/doc/extras_address_standardizer.xml b/doc/extras_address_standardizer.xml index 788e8a19a..f8995e8f8 100644 --- a/doc/extras_address_standardizer.xml +++ b/doc/extras_address_standardizer.xml @@ -1,424 +1,424 @@ - - - Address Standardizer - This is a fork of the PAGC standardizer (original code for this portion was PAGC PostgreSQL Address Standardizer). - The address standardizer is a single line address parser that takes an input address and normalizes it based on a set of rules stored in a table and helper lex and gaz tables. - The code is built into a single postgresql extension library called address_standardizer which can be installed with CREATE EXTENSION address_standardizer;. - The code for this extension can be found in the PostGIS extensions/address_standardizer and is currently self-contained. - For installation instructions refer to: . - How the Parser Works - The parser works from right to left looking first at the macro elements - for postcode, state/province, city, and then looks micro elements to determine -if we are dealing with a house number street or intersection or landmark. -It currently does not look for a country code or name, but that could be -introduced in the future. - - - Country code - Assumed to be US or CA based on: postcode as US or Canada state/province as US or Canada else US - - - Postcode/zipcode - These are recognized using Perl compatible regular expressions. -These regexs are currently in the parseaddress-api.c and are relatively -simple to make changes to if needed. - - - State/province - These are recognized using Perl compatible regular expressions. -These regexs are currently in the parseaddress-api.c but could get moved -into includes in the future for easier maintenance. - - - - - - - This section lists the PostgreSQL data types installed by Address Standardizer extension. Note we describe the casting behavior of these which is very - important especially when designing your own functions. - - - - Address Standardizer Types - - - stdaddr - A composite type that consists of the elements of an address. This is the return type for standardize_address function. - - - Description - A composite type that consists of elements of an address. This is the return type for function. Some descriptions for elements are borrowed from PAGC Postal Attributes. - The token numbers denote the reference number in the rules table. - &address_standardizer_required; - - - building - - is text (token number 0): Refers to building number or name. Unparsed building identifiers and types. Generally blank for most addresses. - - - house_num - - is a text (token number 1): This is the street number on a street. Example 75 in 75 State Street. - - - predir - is text (token number 2): STREET NAME PRE-DIRECTIONAL such as North, South, East, West etc. - - qual - - is text (token number 3): STREET NAME PRE-MODIFIER Example OLD in 3715 OLD HIGHWAY 99. - - - pretype - - is text (token number 4): STREET PREFIX TYPE - - - name - - is text (token number 5): STREET NAME - - - suftype - - is text (token number 6): STREET POST TYPE e.g. St, Ave, Cir. A street type following the root street name. Example STREET in 75 State Street. - - - sufdir - - is text (token number 7): STREET POST-DIRECTIONAL A directional modifier that follows the street name.. Example WEST in 3715 TENTH AVENUE WEST. - - - ruralroute - - is text (token number 8): RURAL ROUTE . Example 8 in RR 7. - - - extra - - is text: Extra information like Floor number. - - - city - - is text (token number 10): Example Boston. - - - state - - is text (token number 11): Example MASSACHUSETTS - - - country - - is text (token number 12): Example USA - - - postcode - - is text POSTAL CODE (ZIP CODE) (token number 13): Example 02109 - - - box - - is text POSTAL BOX NUMBER (token number 14 and 15): Example 02109 - - - unit - - is text Apartment number or Suite Number (token number 17): Example 3B in APT 3B. - - - - - - - - - - - This section lists the PostgreSQL table formats used by the address_standardizer for normalizing addresses. Note that these tables do not need to be named the same as what is referenced here. You can have different lex, gaz, rules tables for each country for example or for your custom geocoder. The names of these tables get passed into the address standardizer functions. - - - - Address Standardizer Tables - - - rules table - The rules table contains a set of rules that maps address input sequence tokens to standardized output sequence - - - Description - A rules table must have at least the following columns, though you are allowed to add more for your own uses. - - - - id - - Primary key of table - - - rule - - text field denoting the rule. Details at PAGC Address Standardizer Rule records. - A rule consists of a set of non-negative integers representing input tokens, terminated by a -1, followed by an equal number of non-negative integers representing postal attributes, terminated by a -1, followed by an integer representing a rule type, followed by an integer representing the rank of the rule. The rules are ranked from 0 (lowest) to 17 (highest). - So for example the rule 2 0 2 22 3 -1 5 5 6 7 3 -1 2 6 maps to sequence of tokens TYPE NUMBER TYPE DIRECT QUALIF to the output sequence STREET STREET SUFTYP SUFDIR QUALIF. The rule is an ARC_C rule of rank 6. - - - - - Each rule has a rule type which is denoted by one of following: - - - MACRO_C - - (token number = "0"). The class of rules for parsing MACRO clauses. - - - - MICRO_C - - (token number = "1"). The class of rules for parsing full MICRO clauses (ie ARC_C plus CIVIC_C). These rules are not used in the build phase. - - - - ARC_C - - (token number = "2"). The class of rules for parsing MICRO clauses, excluding the HOUSE attribute. - - - - CIVIC_C - - (token number = "3"). The class of rules for parsing the HOUSE attribute. - - - - EXTRA_C - - (token number = "4"). The class of rules for parsing EXTRA attributes - attributes excluded from geocoding. These rules are not used in the build phase. - - - - - - - - - - lex table - A lex table is used to classify alphanumeric input and associate that input with (a) input tokens ( See Input Tokens) and (b) standardized representations. - - - Description - A lex (short for lexicon) table is used to classify alphanumeric input and associate that input with (a) input tokens and (b) standardized representations. Things you will find in these tables are ONE mapped to stdworkd: 1. - - A lex has at least the following columns in the table. You may add - - - id - - Primary key of table - - - seq - - integer: definition number? - - - - word - - text: the input word - - - stdword - - text: the standardized replacement word - - - token - - integer: the kind of word it is. Only if it is used in this context will it be replaced. Refer to PAGC Tokens. - - - - - - - - - gaz table - A gaz table is used to standardize place names and associate that input with (a) input tokens ( See Input Tokens) and (b) standardized representations. - - - Description - A gaz (short for gazeteer) table is used to classify place names and associate that input with (a) input tokens and (b) standardized representations. For example if you are in US, you may load these with State Names and associated abbreviations. - - A gaz table has at least the following columns in the table. You may add more columns if you wish for your own purposes. - - - id - - Primary key of table - - - seq - - integer: definition number? - - - word - - text: the input word - - - stdword - - text: the standardized replacement word - - - token - - integer: the kind of word it is. Only if it is used in this context will it be replaced. Refer to PAGC Tokens. - - - - - - - - - - - Address Standardizer Functions - - - parse_address - - Takes a 1 line address and breaks into parts - - - - - - record parse_address - text address - - - - - - - Description - - Returns takes as input an address, and returns a record output consisting of fields num, street, street2, - address1, city, state, zip, zipplus, country. - - - Availability: 2.2.0 - &address_standardizer_required; - - - - - Examples - - SELECT num, street, city, zip, zipplus FROM parse_address('1 Devonshire Place, Boston, MA 02109-1234'); - num | street | city | zip | zipplus ------+------------------+--------+-------+--------- - 1 | Devonshire Place | Boston | 02109 | 1234 - - - - - - - See Also - - - - - - - - standardize_address - - Returns an stdaddr form of an input address utilizing lex, gaz, and rule tables. - - - - - - stdaddr standardize_address - text lextab - text gaztab - text rultab - text address - - - - stdaddr standardize_address - text lextab - text gaztab - text rultab - text micro - text macro - - - - - - Description - - Returns an form of an input address utilizing table name, , and table names and an address. - - - Availability: 2.2.0 - &address_standardizer_required; - - - - - Examples - - SELECT * FROM standardize_address('tiger.pagc_lex', - 'tiger.pagc_gaz', 'tiger.pagc_rules', 'One Devonshire Place, PH 301, Boston, MA 02109-1234'); - - Make easier to read we'll dump output using hstore extension CREATE EXTENSION hstore; you need to install - SELECT (each(hstore(p))).* - FROM standardize_address('tiger.pagc_lex', 'tiger.pagc_gaz', - 'tiger.pagc_rules', 'One Devonshire Place, PH 301, Boston, MA 02109-1234') As p; - key | value -------------+----------------- - box | - city | BOSTON - name | DEVONSHIRE - qual | - unit | # PENTHOUSE 301 - extra | - state | MA - predir | - sufdir | - country | USA - pretype | - suftype | PL - building | - postcode | 02109 - house_num | 1 - ruralroute | -(16 rows) - - - - - - See Also - - , , , , - - - - - + + + Address Standardizer + This is a fork of the PAGC standardizer (original code for this portion was PAGC PostgreSQL Address Standardizer). + The address standardizer is a single line address parser that takes an input address and normalizes it based on a set of rules stored in a table and helper lex and gaz tables. + The code is built into a single postgresql extension library called address_standardizer which can be installed with CREATE EXTENSION address_standardizer;. + The code for this extension can be found in the PostGIS extensions/address_standardizer and is currently self-contained. + For installation instructions refer to: . + How the Parser Works + The parser works from right to left looking first at the macro elements + for postcode, state/province, city, and then looks micro elements to determine +if we are dealing with a house number street or intersection or landmark. +It currently does not look for a country code or name, but that could be +introduced in the future. + + + Country code + Assumed to be US or CA based on: postcode as US or Canada state/province as US or Canada else US + + + Postcode/zipcode + These are recognized using Perl compatible regular expressions. +These regexs are currently in the parseaddress-api.c and are relatively +simple to make changes to if needed. + + + State/province + These are recognized using Perl compatible regular expressions. +These regexs are currently in the parseaddress-api.c but could get moved +into includes in the future for easier maintenance. + + + + + + + This section lists the PostgreSQL data types installed by Address Standardizer extension. Note we describe the casting behavior of these which is very + important especially when designing your own functions. + + + + Address Standardizer Types + + + stdaddr + A composite type that consists of the elements of an address. This is the return type for standardize_address function. + + + Description + A composite type that consists of elements of an address. This is the return type for function. Some descriptions for elements are borrowed from PAGC Postal Attributes. + The token numbers denote the reference number in the rules table. + &address_standardizer_required; + + + building + + is text (token number 0): Refers to building number or name. Unparsed building identifiers and types. Generally blank for most addresses. + + + house_num + + is a text (token number 1): This is the street number on a street. Example 75 in 75 State Street. + + + predir + is text (token number 2): STREET NAME PRE-DIRECTIONAL such as North, South, East, West etc. + + qual + + is text (token number 3): STREET NAME PRE-MODIFIER Example OLD in 3715 OLD HIGHWAY 99. + + + pretype + + is text (token number 4): STREET PREFIX TYPE + + + name + + is text (token number 5): STREET NAME + + + suftype + + is text (token number 6): STREET POST TYPE e.g. St, Ave, Cir. A street type following the root street name. Example STREET in 75 State Street. + + + sufdir + + is text (token number 7): STREET POST-DIRECTIONAL A directional modifier that follows the street name.. Example WEST in 3715 TENTH AVENUE WEST. + + + ruralroute + + is text (token number 8): RURAL ROUTE . Example 8 in RR 7. + + + extra + + is text: Extra information like Floor number. + + + city + + is text (token number 10): Example Boston. + + + state + + is text (token number 11): Example MASSACHUSETTS + + + country + + is text (token number 12): Example USA + + + postcode + + is text POSTAL CODE (ZIP CODE) (token number 13): Example 02109 + + + box + + is text POSTAL BOX NUMBER (token number 14 and 15): Example 02109 + + + unit + + is text Apartment number or Suite Number (token number 17): Example 3B in APT 3B. + + + + + + + + + + + This section lists the PostgreSQL table formats used by the address_standardizer for normalizing addresses. Note that these tables do not need to be named the same as what is referenced here. You can have different lex, gaz, rules tables for each country for example or for your custom geocoder. The names of these tables get passed into the address standardizer functions. + + + + Address Standardizer Tables + + + rules table + The rules table contains a set of rules that maps address input sequence tokens to standardized output sequence + + + Description + A rules table must have at least the following columns, though you are allowed to add more for your own uses. + + + + id + + Primary key of table + + + rule + + text field denoting the rule. Details at PAGC Address Standardizer Rule records. + A rule consists of a set of non-negative integers representing input tokens, terminated by a -1, followed by an equal number of non-negative integers representing postal attributes, terminated by a -1, followed by an integer representing a rule type, followed by an integer representing the rank of the rule. The rules are ranked from 0 (lowest) to 17 (highest). + So for example the rule 2 0 2 22 3 -1 5 5 6 7 3 -1 2 6 maps to sequence of tokens TYPE NUMBER TYPE DIRECT QUALIF to the output sequence STREET STREET SUFTYP SUFDIR QUALIF. The rule is an ARC_C rule of rank 6. + + + + + Each rule has a rule type which is denoted by one of following: + + + MACRO_C + + (token number = "0"). The class of rules for parsing MACRO clauses. + + + + MICRO_C + + (token number = "1"). The class of rules for parsing full MICRO clauses (ie ARC_C plus CIVIC_C). These rules are not used in the build phase. + + + + ARC_C + + (token number = "2"). The class of rules for parsing MICRO clauses, excluding the HOUSE attribute. + + + + CIVIC_C + + (token number = "3"). The class of rules for parsing the HOUSE attribute. + + + + EXTRA_C + + (token number = "4"). The class of rules for parsing EXTRA attributes - attributes excluded from geocoding. These rules are not used in the build phase. + + + + + + + + + + lex table + A lex table is used to classify alphanumeric input and associate that input with (a) input tokens ( See Input Tokens) and (b) standardized representations. + + + Description + A lex (short for lexicon) table is used to classify alphanumeric input and associate that input with (a) input tokens and (b) standardized representations. Things you will find in these tables are ONE mapped to stdworkd: 1. + + A lex has at least the following columns in the table. You may add + + + id + + Primary key of table + + + seq + + integer: definition number? + + + + word + + text: the input word + + + stdword + + text: the standardized replacement word + + + token + + integer: the kind of word it is. Only if it is used in this context will it be replaced. Refer to PAGC Tokens. + + + + + + + + + gaz table + A gaz table is used to standardize place names and associate that input with (a) input tokens ( See Input Tokens) and (b) standardized representations. + + + Description + A gaz (short for gazeteer) table is used to classify place names and associate that input with (a) input tokens and (b) standardized representations. For example if you are in US, you may load these with State Names and associated abbreviations. + + A gaz table has at least the following columns in the table. You may add more columns if you wish for your own purposes. + + + id + + Primary key of table + + + seq + + integer: definition number? + + + word + + text: the input word + + + stdword + + text: the standardized replacement word + + + token + + integer: the kind of word it is. Only if it is used in this context will it be replaced. Refer to PAGC Tokens. + + + + + + + + + + + Address Standardizer Functions + + + parse_address + + Takes a 1 line address and breaks into parts + + + + + + record parse_address + text address + + + + + + + Description + + Returns takes as input an address, and returns a record output consisting of fields num, street, street2, + address1, city, state, zip, zipplus, country. + + + Availability: 2.2.0 + &address_standardizer_required; + + + + + Examples + + SELECT num, street, city, zip, zipplus FROM parse_address('1 Devonshire Place, Boston, MA 02109-1234'); + num | street | city | zip | zipplus +-----+------------------+--------+-------+--------- + 1 | Devonshire Place | Boston | 02109 | 1234 + + + + + + + See Also + + + + + + + + standardize_address + + Returns an stdaddr form of an input address utilizing lex, gaz, and rule tables. + + + + + + stdaddr standardize_address + text lextab + text gaztab + text rultab + text address + + + + stdaddr standardize_address + text lextab + text gaztab + text rultab + text micro + text macro + + + + + + Description + + Returns an form of an input address utilizing table name, , and table names and an address. + + + Availability: 2.2.0 + &address_standardizer_required; + + + + + Examples + + SELECT * FROM standardize_address('tiger.pagc_lex', + 'tiger.pagc_gaz', 'tiger.pagc_rules', 'One Devonshire Place, PH 301, Boston, MA 02109-1234'); + + Make easier to read we'll dump output using hstore extension CREATE EXTENSION hstore; you need to install + SELECT (each(hstore(p))).* + FROM standardize_address('tiger.pagc_lex', 'tiger.pagc_gaz', + 'tiger.pagc_rules', 'One Devonshire Place, PH 301, Boston, MA 02109-1234') As p; + key | value +------------+----------------- + box | + city | BOSTON + name | DEVONSHIRE + qual | + unit | # PENTHOUSE 301 + extra | + state | MA + predir | + sufdir | + country | USA + pretype | + suftype | PL + building | + postcode | 02109 + house_num | 1 + ruralroute | +(16 rows) + + + + + + See Also + + , , , , + + + + + -- 2.50.1