From: Regina Obe Date: Wed, 9 Sep 2015 06:30:28 +0000 (+0000) Subject: Flesh out the rules table description and how to create rules X-Git-Tag: 2.2.0rc1~44 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=36bfb490217253f86f93db0999507183d7937cf0;p=postgis Flesh out the rules table description and how to create rules git-svn-id: http://svn.osgeo.org/postgis/trunk@14056 b70326c6-7e19-0410-871a-916f4a2858ee --- diff --git a/doc/extras_address_standardizer.xml b/doc/extras_address_standardizer.xml index 7c0cd2cc8..0678a4543 100644 --- a/doc/extras_address_standardizer.xml +++ b/doc/extras_address_standardizer.xml @@ -147,7 +147,7 @@ into includes in the future for easier maintenance. rules table - The rules table contains a set of rules that maps address input sequence tokens to standardized output sequence + The rules table contains a set of rules that maps address input sequence tokens to standardized output sequence. A rule is defined as a set of input tokens followed by -1 (terminator) followed by set of output tokens followed by -1 followed by number denoting kind of rule followed by ranking of rule. Description @@ -352,42 +352,127 @@ into includes in the future for easier maintenance. Output Tokens - After the first -1 (terminator), follows the output tokens and their order, followed by a terminator -1. Numbers for corresponding output tokens are listed in . + After the first -1 (terminator), follows the output tokens and their order, followed by a terminator -1. Numbers for corresponding output tokens are listed in . What are allowed is dependent on kind of rule. Output tokens valid for each rule type are listed in . - Rule Types and Rank + Rule Types and Rank The final part of the rule is the rule type which is denoted by one of the following, followed by a rule rank. The rules are ranked from 0 (lowest) to 17 (highest). - - - MACRO_C - - (token number = "0"). The class of rules for parsing MACRO clauses such as PLACE STATE ZIP - - - - MICRO_C - - (token number = "1"). The class of rules for parsing full MICRO clauses (such as House, street, sufdir, predir, pretyp, suftype, qualif) (ie ARC_C plus CIVIC_C). These rules are not used in the build phase. - - - - ARC_C - - (token number = "2"). The class of rules for parsing MICRO clauses, excluding the HOUSE attribute. - - - - CIVIC_C - - (token number = "3"). The class of rules for parsing the HOUSE attribute. - - - - EXTRA_C - - (token number = "4"). The class of rules for parsing EXTRA attributes - attributes excluded from geocoding. These rules are not used in the build phase. - - + + MACRO_C + (token number = "0"). The class of rules for parsing MACRO clauses such as PLACE STATE ZIP + MACRO_C output tokens (excerpted from http://www.pagcgeo.org/docs/html/pagc-12.html#--r-typ--. + + + CITY + + (token number "10"). Example "Albany" + + + + STATE + + (token number "11"). Example "NY" + + + + NATION + + (token number "12"). This attribute is not used in most reference files. Example "USA" + + + + POSTAL + + (token number "13"). (SADS elements "ZIP CODE" , "PLUS 4" ). This attribute is used for both the US Zip and the Canadian Postal Codes. + + + + + MICRO_C + (token number = "1"). The class of rules for parsing full MICRO clauses (such as House, street, sufdir, predir, pretyp, suftype, qualif) (ie ARC_C plus CIVIC_C). These rules are not used in the build phase. + MICRO_C output tokens (excerpted from http://www.pagcgeo.org/docs/html/pagc-12.html#--r-typ--. + + HOUSE + + is a text (token number 1): This is the street number on a street. Example 75 in 75 State Street. + + + predir + is text (token number 2): STREET NAME PRE-DIRECTIONAL such as North, South, East, West etc. + + qual + + is text (token number 3): STREET NAME PRE-MODIFIER Example OLD in 3715 OLD HIGHWAY 99. + + + pretype + + is text (token number 4): STREET PREFIX TYPE + + + street + + is text (token number 5): STREET NAME + + + suftype + + is text (token number 6): STREET POST TYPE e.g. St, Ave, Cir. A street type following the root street name. Example STREET in 75 State Street. + + + sufdir + + is text (token number 7): STREET POST-DIRECTIONAL A directional modifier that follows the street name.. Example WEST in 3715 TENTH AVENUE WEST. + + + + + ARC_C + (token number = "2"). The class of rules for parsing MICRO clauses, excluding the HOUSE attribute. As such uses same set of output tokens as MICRO_C minus the HOUSE token. + + CIVIC_C + (token number = "3"). The class of rules for parsing the HOUSE attribute. + + EXTRA_C + (token number = "4"). The class of rules for parsing EXTRA attributes - attributes excluded from geocoding. These rules are not used in the build phase. + + EXTRA_C output tokens (excerpted from http://www.pagcgeo.org/docs/html/pagc-12.html#--r-typ--. + + BLDNG + + (token number 0): Unparsed building identifiers and types. + + + BOXH + + i(token number 14): The BOX in BOX 3B + + + BOXT + + (token number 15): The 3B in BOX 3B + + + RR + + (token number 8): The RR in RR 7 + + + UNITH + + (token number 16): The APT in APT 3B + + + UNITT + + (token number 17): The 3B in APT 3B + + + UNKNWN + + (token number 9): An otherwise unclassified output. + +