1 <!-- doc/src/sgml/fuzzystrmatch.sgml -->
3 <sect1 id="fuzzystrmatch" xreflabel="fuzzystrmatch">
4 <title>fuzzystrmatch</title>
6 <indexterm zone="fuzzystrmatch">
7 <primary>fuzzystrmatch</primary>
11 The <filename>fuzzystrmatch</> module provides several
12 functions to determine similarities and distance between strings.
17 At present, the <function>soundex</>, <function>metaphone</>,
18 <function>dmetaphone</>, and <function>dmetaphone_alt</> functions do
19 not work well with multibyte encodings (such as UTF-8).
24 <title>Soundex</title>
27 The Soundex system is a method of matching similar-sounding names
28 by converting them to the same code. It was initially used by the
29 United States Census in 1880, 1900, and 1910. Note that Soundex
30 is not very useful for non-English names.
34 The <filename>fuzzystrmatch</> module provides two functions
35 for working with Soundex codes:
39 <primary>soundex</primary>
43 <primary>difference</primary>
47 soundex(text) returns text
48 difference(text, text) returns int
52 The <function>soundex</> function converts a string to its Soundex code.
53 The <function>difference</> function converts two strings to their Soundex
54 codes and then reports the number of matching code positions. Since
55 Soundex codes have four characters, the result ranges from zero to four,
56 with zero being no match and four being an exact match. (Thus, the
57 function is misnamed — <function>similarity</> would have been
62 Here are some usage examples:
66 SELECT soundex('hello world!');
68 SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
69 SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
70 SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
72 CREATE TABLE s (nm text);
74 INSERT INTO s VALUES ('john');
75 INSERT INTO s VALUES ('joan');
76 INSERT INTO s VALUES ('wobbly');
77 INSERT INTO s VALUES ('jack');
79 SELECT * FROM s WHERE soundex(nm) = soundex('john');
81 SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
86 <title>Levenshtein</title>
89 This function calculates the Levenshtein distance between two strings:
93 <primary>levenshtein</primary>
97 <primary>levenshtein_less_equal</primary>
101 levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
102 levenshtein(text source, text target) returns int
103 levenshtein_less_equal(text source, text target, int ins_cost, int del_cost, int sub_cost, int max_d) returns int
104 levenshtein_less_equal(text source, text target, int max_d) returns int
108 Both <literal>source</literal> and <literal>target</literal> can be any
109 non-null string, with a maximum of 255 characters. The cost parameters
110 specify how much to charge for a character insertion, deletion, or
111 substitution, respectively. You can omit the cost parameters, as in
112 the second version of the function; in that case they all default to 1.
116 <function>levenshtein_less_equal</function> is an accelerated version of the
117 Levenshtein function for use when only small distances are of interest.
118 If the actual distance is less than or equal to <literal>max_d</>,
119 then <function>levenshtein_less_equal</function> returns the correct
120 distance; otherwise it returns some value greater than <literal>max_d</>.
121 If <literal>max_d</> is negative then the behavior is the same as
122 <function>levenshtein</function>.
130 test=# SELECT levenshtein('GUMBO', 'GAMBOL');
136 test=# SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1);
142 test=# SELECT levenshtein_less_equal('extensive', 'exhaustive',2);
143 levenshtein_less_equal
144 ------------------------
148 test=# SELECT levenshtein_less_equal('extensive', 'exhaustive',4);
149 levenshtein_less_equal
150 ------------------------
157 <title>Metaphone</title>
160 Metaphone, like Soundex, is based on the idea of constructing a
161 representative code for an input string. Two strings are then
162 deemed similar if they have the same codes.
166 This function calculates the metaphone code of an input string:
170 <primary>metaphone</primary>
174 metaphone(text source, int max_output_length) returns text
178 <literal>source</literal> has to be a non-null string with a maximum of
179 255 characters. <literal>max_output_length</literal> sets the maximum
180 length of the output metaphone code; if longer, the output is truncated
189 test=# SELECT metaphone('GUMBO', 4);
198 <title>Double Metaphone</title>
201 The Double Metaphone system computes two <quote>sounds like</> strings
202 for a given input string — a <quote>primary</> and an
203 <quote>alternate</>. In most cases they are the same, but for non-English
204 names especially they can be a bit different, depending on pronunciation.
205 These functions compute the primary and alternate codes:
209 <primary>dmetaphone</primary>
213 <primary>dmetaphone_alt</primary>
217 dmetaphone(text source) returns text
218 dmetaphone_alt(text source) returns text
222 There is no length limit on the input strings.
230 test=# select dmetaphone('gumbo');