From 402027aab0fa2566d8b0f32aab4a4c0759f78bbf Mon Sep 17 00:00:00 2001 From: Regina Obe Date: Sat, 25 Oct 2014 08:52:35 +0000 Subject: [PATCH] fill in remaining tokens git-svn-id: http://svn.osgeo.org/postgis/trunk@13112 b70326c6-7e19-0410-871a-916f4a2858ee --- doc/extras_address_standardizer.xml | 122 ++++++++++++++++++++++++++-- 1 file changed, 116 insertions(+), 6 deletions(-) diff --git a/doc/extras_address_standardizer.xml b/doc/extras_address_standardizer.xml index 07b022791..746c68cc2 100644 --- a/doc/extras_address_standardizer.xml +++ b/doc/extras_address_standardizer.xml @@ -170,7 +170,7 @@ into includes in the future for easier maintenance. - Input Tokens + Input Tokens Each rule starts with a set of input tokens followed by a terminator -1. Valid input tokens excerpted from PAGC Input Tokens are as follows: Form-Based Input Tokens @@ -236,8 +236,118 @@ into includes in the future for easier maintenance. (1). A word is a string of letters of arbitrary length. A single letter can be both a SINGLE and a WORD. + + + Function-based Input Tokens + + + BOXH + + (14). Words used to denote post office boxes. For example Box or PO Box. + + + + + BUILDH + + (19). Words used to denote buildings or building complexes, usually as a prefix. For example: Tower in Tower 7A. + + + + + BUILDT + + (24). Words and abbreviations used to denote buildings or building complexes, usually as a suffix. For example: Shopping Centre. + + + + DIRECT + + (22). Words used to denote directions, for example North. + + + + + MILE + + (20). Words used to denote milepost addresses. + + + + + ROAD + + (6). Words and abbreviations used to denote highways and roads. For example: the Interstate in Interstate 5 + + + + + RR + + (8). Words and abbreviations used to denote rural routes. RR. + + + + + TYPE + + (2). Words and abbreviation used to denote street typess. For example: ST or AVE. + + + + + UNITH + + (16). Words and abbreviation used to denote internal subaddresses. For example, APT or UNIT. + + + + + Postal Type Input Tokens + + + QUINT + + (28). A 5 digit number. Identifies a Zip Code + + + + + QUAD + + (29). A 4 digit number. Identifies ZIP4. + + + + + PCH + + (27). A 3 character sequence of letter number letter. Identifies an FSA, the first 3 characters of a Canadian postal code. + + + + + PCT + + (26). A 3 character sequence of number letter number. Identifies an LDU, the last 3 characters of a Canadian postal code. + + + + Stopwords + STOPWORDS combine with WORDS. In rules a string of multiple WORDs and STOPWORDs will be represented by a single WORD token. + + + STOPWORD + + (7). A word with low lexical significance, that can be omitted in parsing. For example: THE. + + + + + + Output Tokens @@ -284,11 +394,11 @@ into includes in the future for easier maintenance. lex table - A lex table is used to classify alphanumeric input and associate that input with (a) input tokens ( See Input Tokens) and (b) standardized representations. + A lex table is used to classify alphanumeric input and associate that input with (a) input tokens ( See ) and (b) standardized representations. Description - A lex (short for lexicon) table is used to classify alphanumeric input and associate that input with (a) input tokens and (b) standardized representations. Things you will find in these tables are ONE mapped to stdworkd: 1. + A lex (short for lexicon) table is used to classify alphanumeric input and associate that input with and (b) standardized representations. Things you will find in these tables are ONE mapped to stdword: 1. A lex has at least the following columns in the table. You may add @@ -326,11 +436,11 @@ into includes in the future for easier maintenance. gaz table - A gaz table is used to standardize place names and associate that input with (a) input tokens ( See Input Tokens) and (b) standardized representations. + A gaz table is used to standardize place names and associate that input with (a) input tokens ( See ) and (b) standardized representations. Description - A gaz (short for gazeteer) table is used to classify place names and associate that input with (a) input tokens and (b) standardized representations. For example if you are in US, you may load these with State Names and associated abbreviations. + A gaz (short for gazeteer) table is used to classify place names and associate that input with and (b) standardized representations. For example if you are in US, you may load these with State Names and associated abbreviations. A gaz table has at least the following columns in the table. You may add more columns if you wish for your own purposes. @@ -342,7 +452,7 @@ into includes in the future for easier maintenance. seq - integer: definition number? + integer: definition number? - identifer used for that instance of the word word -- 2.40.0