1 <chapter Id="typeconv">
2 <title>Type Conversion</title>
5 <acronym>SQL</acronym> queries can, intentionally or not, require
6 mixing of different data types in the same expression.
7 <productname>Postgres</productname> has extensive facilities for
8 evaluating mixed-type expressions.
11 In many cases a user will not need
12 to understand the details of the type conversion mechanism.
13 However, the implicit conversions done by <productname>Postgres</productname>
14 can affect the apparent results of a query, and these results
15 can be tailored by a user or programmer
16 using <emphasis>explicit</emphasis> type coersion.
19 This chapter introduces the <productname>Postgres</productname>
20 type conversion mechanisms and conventions.
21 Refer to the relevant sections in the User's Guide and Programmer's Guide
22 for more information on specific data types and allowed functions and operators.
25 The Programmer's Guide has more details on the exact algorithms used for
26 implicit type conversion and coersion.
29 <title>Overview</title>
32 <acronym>SQL</acronym> is a strongly typed language. That is, every data item
33 has an associated data type which determines its behavior and allowed usage.
34 <productname>Postgres</productname> has an extensible type system which is
35 much more general and flexible than other <acronym>RDBMS</acronym> implementations.
36 Hence, most type conversion behavior in <productname>Postgres</productname>
37 should be governed by general rules rather than by ad-hoc heuristics to allow
38 mixed-type expressions to be meaningful, even with user-defined types.
41 The <productname>Postgres</productname> scanner/parser decodes lexical elements
42 into only five fundamental categories: integers, floats, strings, names, and keywords.
43 Most extended types are first tokenized into strings. The <acronym>SQL</acronym>
44 language definition allows specifying type names with strings, and this mechanism
45 is used by <productname>Postgres</productname>
46 to start the parser down the correct path. For example, the query
49 tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
56 has two strings, of type <type>text</type> and <type>point</type>.
57 If a type is not specified, then the placeholder type <type>unknown</type>
58 is assigned initially, to be resolved in later stages as described below.
61 There are four fundamental <acronym>SQL</acronym> constructs requiring
62 distinct type conversion rules in the <productname>Postgres</productname>
72 <productname>Postgres</productname> allows expressions with
73 left- and right-unary (one argument) operators,
74 as well as binary (two argument) operators.
83 Much of the <productname>Postgres</productname> type system is built around a rich set of
84 functions. Function calls have one or more arguments which, for any specific query,
85 must be matched to the functions available in the system catalog.
94 <acronym>SQL</acronym> INSERT statements place the results of query into a table. The expressions
95 in the query must be matched up with, and perhaps converted to, the target columns of the insert.
104 Since all select results from a UNION SELECT statement must appear in a single set of columns, the types
105 of each SELECT clause must be matched up and converted to a uniform set.
111 Many of the general type conversion rules use simple conventions built on
112 the <productname>Postgres</productname> function and operator system tables.
113 There are some heuristics included in the conversion rules to better support
114 conventions for the <acronym>SQL92</acronym> standard native types such as
115 <type>smallint</type>, <type>integer</type>, and <type>float</type>.
118 The <productname>Postgres</productname> parser uses the convention that all
119 type conversion functions take a single argument of the source type and are
120 named with the same name as the target type. Any function meeting this
121 criteria is considered to be a valid conversion function, and may be used
122 by the parser as such. This simple assumption gives the parser the power
123 to explore type conversion possibilities without hardcoding, allowing
124 extended user-defined types to use these same features transparently.
127 An additional heuristic is provided in the parser to allow better guesses
128 at proper behavior for <acronym>SQL</acronym> standard types. There are
129 five categories of types defined: boolean, string, numeric, geometric,
130 and user-defined. Each category, with the exception of user-defined, has
131 a "preferred type" which is used to resolve ambiguities in candidates.
132 Each "user-defined" type is its own "preferred type", so ambiguous
133 expressions (those with multiple candidate parsing solutions)
134 with only one user-defined type can resolve to a single best choice, while those with
135 multiple user-defined types will remain ambiguous and throw an error.
138 Ambiguous expressions which have candidate solutions within only one type category are
139 likely to resolve, while ambiguous expressions with candidates spanning multiple
140 categories are likely to throw an error and ask for clarification from the user.
143 <title>Guidelines</title>
146 All type conversion rules are designed with several principles in mind:
148 <itemizedlist mark="bullet" spacing="compact">
151 Implicit conversions should never have suprising or unpredictable outcomes.
155 User-defined types, of which the parser has no apriori knowledge, should be
156 "higher" in the type heirarchy. In mixed-type expressions, native types shall always
157 be converted to a user-defined type (of course, only if conversion is necessary).
161 User-defined types are not related. Currently, <productname>Postgres</productname>
162 does not have information available to it on relationships between types, other than
163 hardcoded heuristics for built-in types and implicit relationships based on available functions
168 There should be no extra overhead from the parser or executor
169 if a query does not need implicit type conversion.
170 That is, if a query is well formulated and the types already match up, then the query should proceed
171 without spending extra time in the parser and without introducing unnecessary implicit conversion
172 functions into the query.
175 Additionally, if a query usually requires an implicit conversion for a function, and
176 if then the user defines an explicit function with the correct argument types, the parser
177 should use this new function and will no longer do the implicit conversion using the old function.
181 <title>Operators</title>
184 <title>Conversion Procedure</title>
188 <title>Operator Evaluation</title>
192 <step performance="required">
194 Check for an exact match in the pg_operator system catalog.
197 <step performance="optional">
199 If one argument of a binary operator is <type>unknown</type>,
200 then assume it is the same type as the other argument.
202 <step performance="required">
204 Reverse the arguments, and look for an exact match with an operator which
205 points to itself as being commutative.
206 If found, then reverse the arguments in the parse tree and use this operator.
210 <step performance="required">
212 Look for the best match.
215 <step performance="optional">
217 Make a list of all operators of the same name.
219 <step performance="required">
221 If only one operator is in the list, use it if the input type can be coerced,
222 and throw an error if the type cannot be coerced.
224 <step performance="required">
226 Keep all operators with the most explicit matches for types. Keep all if there
227 are no explicit matches and move to the next step.
228 If only one candidate remains, use it if the type can be coerced.
230 <step performance="required">
232 If any input arguments are "unknown", categorize the input candidates as
233 boolean, numeric, string, geometric, or user-defined. If there is a mix of
234 categories, or more than one user-defined type, throw an error because
235 the correct choice cannot be deduced without more clues.
236 If only one category is present, then assign the "preferred type"
237 to the input column which had been previously "unknown".
239 <step performance="required">
241 Choose the candidate with the most exact type matches, and which matches
242 the "preferred type" for each column category from the previous step.
243 If there is still more than one candidate, or if there are none,
250 <title>Examples</title>
253 <title>Exponentiation Operator</title>
256 There is only one exponentiation
257 operator defined in the catalog, and it takes <type>float8</type> arguments.
258 The scanner assigns an initial type of <type>int4</type> to both arguments
259 of this query expression:
261 tgl=> select 2 ^ 3 AS "Exp";
268 So the parser does a type conversion on both operands and the query
272 tgl=> select float8(2) ^ float8(3) AS "Exp";
282 tgl=> select 2.0 ^ 3.0 AS "Exp";
291 This last form has the least overhead, since no functions are called to do
292 implicit type conversion. This is not an issue for small queries, but may
293 have an impact on the performance of queries involving large tables.
297 <title>String Concatenation</title>
300 A string-like syntax is used for working with string types as well as for
301 working with complex extended types.
302 Strings with unspecified type are matched with likely operator candidates.
305 One unspecified argument:
307 tgl=> SELECT text 'abc' || 'def' AS "Text and Unknown";
315 In this case the parser looks to see if there is an operator taking <type>text</type>
316 for both arguments. Since there is, it assumes that the second argument should
317 be interpreted as of type <type>text</type>.
320 Concatenation on unspecified types:
322 tgl=> SELECT 'abc' || 'def' AS "Unspecified";
330 In this case there is no initial hint for which type to use, since no types
331 are specified in the query. So, the parser looks for all candidate operators
332 and finds that all arguments for all the candidates are string types. It chooses
333 the "preferred type" for strings, <type>text</type>, for this query.
337 If a user defines a new type and defines an operator <quote>||</quote> to work
338 with it, then this query would no longer succeed as written. The parser would
339 now have candidate types from two categories, and could not decide which to use.
343 <title>Factorial</title>
346 This example illustrates an interesting result. Traditionally, the
347 factorial operator is defined for integers only. The <productname>Postgres</productname>
348 operator catalog has only one entry for factorial, taking an integer operand.
349 If given a non-integer numeric argument, <productname>Postgres</productname>
350 will try to convert that argument to an integer for evaluation of the
354 tgl=> select (4.3 !);
363 Of course, this leads to a mathematically suspect result,
364 since in principle the factorial of a non-integer is not defined.
365 However, the role of a database is not to teach mathematics, but
366 to be a tool for data manipulation. If a user chooses to take the
367 factorial of a floating point number, <productname>Postgres</productname>
372 <title>Functions</title>
377 <title>Function Evaluation</title>
379 <step performance="required">
381 Check for an exact match in the pg_proc system catalog.
383 <step performance="required">
385 Look for the best match.
388 <step performance="required">
390 Make a list of all functions of the same name with the same number of arguments.
392 <step performance="required">
394 If only one function is in the list, use it if the input types can be coerced,
395 and throw an error if the types cannot be coerced.
397 <step performance="required">
399 Keep all functions with the most explicit matches for types. Keep all if there
400 are no explicit matches and move to the next step.
401 If only one candidate remains, use it if the type can be coerced.
403 <step performance="required">
405 If any input arguments are "unknown", categorize the input candidate arguments as
406 boolean, numeric, string, geometric, or user-defined. If there is a mix of
407 categories, or more than one user-defined type, throw an error because
408 the correct choice cannot be deduced without more clues.
409 If only one category is present, then assign the "preferred type"
410 to the input column which had been previously "unknown".
412 <step performance="required">
414 Choose the candidate with the most exact type matches, and which matches
415 the "preferred type" for each column category from the previous step.
416 If there is still more than one candidate, or if there are none,
423 <title>Examples</title>
426 <title>Factorial Function</title>
429 There is only one factorial function defined in the pg_proc catalog.
430 So the following query automatically converts the <type>int2</type> argument
431 to <type>int4</type>:
434 tgl=> select int4fac(int2 '4');
441 and is actually transformed by the parser to
443 tgl=> select int4fac(int4(int2 '4'));
451 <title>Substring Function</title>
454 There are two <function>substr</function> functions declared in pg_proc. However,
455 only one takes two arguments, of types <type>text</type> and <type>int4</type>.
458 If called with a string constant of unspecified type, the type is matched up
459 directly with the only candidate function type:
461 tgl=> select substr('1234', 3);
469 If the string is declared to be of type <type>varchar</type>, as might be the case
470 if it comes from a table, then the parser will try to coerce it to become <type>text</type>:
472 tgl=> select substr(varchar '1234', 3);
478 which is transformed by the parser to become
480 tgl=> select substr(text(varchar '1234'), 3);
488 There are some heuristics in the parser to optimize the relationship between the
489 <type>char</type>, <type>varchar</type>, and <type>text</type> types.
490 For this case, <function>substr</function> is called directly with the <type>varchar</type> string
491 rather than inserting an explicit conversion call.
495 And, if the function is called with an <type>int4</type>, the parser will
496 try to convert that to <type>text</type>:
498 tgl=> select substr(1234, 3);
506 tgl=> select substr(text(1234), 3);
514 <title>Query Targets</title>
519 <title>Target Evaluation</title>
521 <step performance="required">
523 Check for an exact match with the target.
525 <step performance="required">
527 Try to coerce the expression directly to the target type if necessary.
529 <step performance="required">
531 If the target is a fixed-length type (e.g. <type>char</type> or <type>varchar</type>
532 declared with a length) then try to find a sizing function of the same name
533 as the type taking two arguments, the first the type name and the second an
539 <title>Examples</title>
542 <title><type>varchar</type> Storage</title>
545 For a target column declared as <type>varchar(4)</type> the following query
546 ensures that the target is sized correctly:
549 tgl=> CREATE TABLE vv (v varchar(4));
551 tgl=> INSERT INTO vv SELECT 'abc' || 'def';
553 tgl=> select * from vv;
562 <title>UNION Queries</title>
565 The UNION construct is somewhat different in that it must match up
566 possibly dissimilar types to become a single result set.
569 <title>UNION Evaluation</title>
571 <step performance="required">
573 Check for identical types for all results.
575 <step performance="required">
577 Coerce each result from the UNION clauses to match the type of the
578 first SELECT clause or the target column.
583 <title>Examples</title>
586 <title>Underspecified Types</title>
590 tgl=> SELECT text 'a' AS "Text" UNION SELECT 'b';
599 <title>Simple UNION</title>
603 tgl=> SELECT 1.2 AS Float8 UNION SELECT 1;
612 <title>Transposed UNION</title>
615 The types of the union are forced to match the types of
616 the first/top clause in the union:
619 tgl=> SELECT 1 AS "All integers"
620 tgl-> UNION SELECT '2.2'::float4
621 tgl-> UNION SELECT 3.3;
631 An alternate parser strategy could be to choose the "best" type of the bunch, but
632 this is more difficult because of the nice recursion technique used in the
633 parser. However, the "best" type is used when selecting <emphasis>into</emphasis>
637 tgl=> CREATE TABLE ff (f float);
641 tgl-> UNION SELECT '2.2'::float4
642 tgl-> UNION SELECT 3.3;
644 tgl=> SELECT f AS "Floating point" from ff;