1 <chapter Id="typeconv">
2 <title>Type Conversion</title>
5 <acronym>SQL</acronym> queries can, intentionally or not, require
6 mixing of different data types in the same expression.
7 <productname>Postgres</productname> has extensive facilities for
8 evaluating mixed-type expressions.
12 In many cases a user will not need
13 to understand the details of the type conversion mechanism.
14 However, the implicit conversions done by <productname>Postgres</productname>
15 can affect the apparent results of a query, and these results
16 can be tailored by a user or programmer
17 using <emphasis>explicit</emphasis> type coercion.
21 This chapter introduces the <productname>Postgres</productname>
22 type conversion mechanisms and conventions.
23 Refer to the relevant sections in the User's Guide and Programmer's Guide
24 for more information on specific data types and allowed functions and operators.
28 The Programmer's Guide has more details on the exact algorithms used for
29 implicit type conversion and coercion.
33 <title>Overview</title>
36 <acronym>SQL</acronym> is a strongly typed language. That is, every data item
37 has an associated data type which determines its behavior and allowed usage.
38 <productname>Postgres</productname> has an extensible type system which is
39 much more general and flexible than other <acronym>RDBMS</acronym> implementations.
40 Hence, most type conversion behavior in <productname>Postgres</productname>
41 should be governed by general rules rather than by ad-hoc heuristics to allow
42 mixed-type expressions to be meaningful, even with user-defined types.
46 The <productname>Postgres</productname> scanner/parser decodes lexical elements
47 into only five fundamental categories: integers, floats, strings, names, and keywords.
48 Most extended types are first tokenized into strings. The <acronym>SQL</acronym>
49 language definition allows specifying type names with strings, and this mechanism
50 is used by <productname>Postgres</productname>
51 to start the parser down the correct path. For example, the query
54 tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
61 has two strings, of type <type>text</type> and <type>point</type>.
62 If a type is not specified, then the placeholder type <type>unknown</type>
63 is assigned initially, to be resolved in later stages as described below.
67 There are four fundamental <acronym>SQL</acronym> constructs requiring
68 distinct type conversion rules in the <productname>Postgres</productname>
79 <productname>Postgres</productname> allows expressions with
80 left- and right-unary (one argument) operators,
81 as well as binary (two argument) operators.
91 Much of the <productname>Postgres</productname> type system is built around a rich set of
92 functions. Function calls have one or more arguments which, for any specific query,
93 must be matched to the functions available in the system catalog.
103 <acronym>SQL</acronym> INSERT statements place the results of query into a table. The expressions
104 in the query must be matched up with, and perhaps converted to, the target columns of the insert.
114 Since all select results from a UNION SELECT statement must appear in a single set of columns, the types
115 of each SELECT clause must be matched up and converted to a uniform set.
122 Many of the general type conversion rules use simple conventions built on
123 the <productname>Postgres</productname> function and operator system tables.
124 There are some heuristics included in the conversion rules to better support
125 conventions for the <acronym>SQL92</acronym> standard native types such as
126 <type>smallint</type>, <type>integer</type>, and <type>float</type>.
130 The <productname>Postgres</productname> parser uses the convention that all
131 type conversion functions take a single argument of the source type and are
132 named with the same name as the target type. Any function meeting this
133 criteria is considered to be a valid conversion function, and may be used
134 by the parser as such. This simple assumption gives the parser the power
135 to explore type conversion possibilities without hardcoding, allowing
136 extended user-defined types to use these same features transparently.
140 An additional heuristic is provided in the parser to allow better guesses
141 at proper behavior for <acronym>SQL</acronym> standard types. There are
142 five categories of types defined: boolean, string, numeric, geometric,
143 and user-defined. Each category, with the exception of user-defined, has
144 a "preferred type" which is used to resolve ambiguities in candidates.
145 Each "user-defined" type is its own "preferred type", so ambiguous
146 expressions (those with multiple candidate parsing solutions)
147 with only one user-defined type can resolve to a single best choice, while those with
148 multiple user-defined types will remain ambiguous and throw an error.
152 Ambiguous expressions which have candidate solutions within only one type category are
153 likely to resolve, while ambiguous expressions with candidates spanning multiple
154 categories are likely to throw an error and ask for clarification from the user.
158 <title>Guidelines</title>
161 All type conversion rules are designed with several principles in mind:
163 <itemizedlist mark="bullet" spacing="compact">
166 Implicit conversions should never have surprising or unpredictable outcomes.
172 User-defined types, of which the parser has no a-priori knowledge, should be
173 "higher" in the type hierarchy. In mixed-type expressions, native types shall always
174 be converted to a user-defined type (of course, only if conversion is necessary).
180 User-defined types are not related. Currently, <productname>Postgres</productname>
181 does not have information available to it on relationships between types, other than
182 hardcoded heuristics for built-in types and implicit relationships based on available functions
189 There should be no extra overhead from the parser or executor
190 if a query does not need implicit type conversion.
191 That is, if a query is well formulated and the types already match up, then the query should proceed
192 without spending extra time in the parser and without introducing unnecessary implicit conversion
193 functions into the query.
197 Additionally, if a query usually requires an implicit conversion for a function, and
198 if then the user defines an explicit function with the correct argument types, the parser
199 should use this new function and will no longer do the implicit conversion using the old function.
208 <title>Operators</title>
211 <title>Conversion Procedure</title>
214 <title>Operator Evaluation</title>
217 <step performance="required">
219 Check for an exact match in the pg_operator system catalog.
223 <step performance="optional">
225 If one argument of a binary operator is <type>unknown</type>,
226 then assume it is the same type as the other argument.
229 <step performance="required">
231 Reverse the arguments, and look for an exact match with an operator which
232 points to itself as being commutative.
233 If found, then reverse the arguments in the parse tree and use this operator.
239 <step performance="required">
241 Look for the best match.
244 <step performance="optional">
246 Make a list of all operators of the same name.
249 <step performance="required">
251 If only one operator is in the list, use it if the input type can be coerced,
252 and throw an error if the type cannot be coerced.
255 <step performance="required">
257 Keep all operators with the most explicit matches for types. Keep all if there
258 are no explicit matches and move to the next step.
259 If only one candidate remains, use it if the type can be coerced.
262 <step performance="required">
264 If any input arguments are "unknown", categorize the input candidates as
265 boolean, numeric, string, geometric, or user-defined. If there is a mix of
266 categories, or more than one user-defined type, throw an error because
267 the correct choice cannot be deduced without more clues.
268 If only one category is present, then assign the "preferred type"
269 to the input column which had been previously "unknown".
272 <step performance="required">
274 Choose the candidate with the most exact type matches, and which matches
275 the "preferred type" for each column category from the previous step.
276 If there is still more than one candidate, or if there are none,
286 <title>Examples</title>
289 <title>Exponentiation Operator</title>
292 There is only one exponentiation
293 operator defined in the catalog, and it takes <type>float8</type> arguments.
294 The scanner assigns an initial type of <type>int4</type> to both arguments
295 of this query expression:
297 tgl=> select 2 ^ 3 AS "Exp";
304 So the parser does a type conversion on both operands and the query
308 tgl=> select float8(2) ^ float8(3) AS "Exp";
318 tgl=> select 2.0 ^ 3.0 AS "Exp";
327 This last form has the least overhead, since no functions are called to do
328 implicit type conversion. This is not an issue for small queries, but may
329 have an impact on the performance of queries involving large tables.
336 <title>String Concatenation</title>
339 A string-like syntax is used for working with string types as well as for
340 working with complex extended types.
341 Strings with unspecified type are matched with likely operator candidates.
345 One unspecified argument:
347 tgl=> SELECT text 'abc' || 'def' AS "Text and Unknown";
356 In this case the parser looks to see if there is an operator taking <type>text</type>
357 for both arguments. Since there is, it assumes that the second argument should
358 be interpreted as of type <type>text</type>.
362 Concatenation on unspecified types:
364 tgl=> SELECT 'abc' || 'def' AS "Unspecified";
373 In this case there is no initial hint for which type to use, since no types
374 are specified in the query. So, the parser looks for all candidate operators
375 and finds that all arguments for all the candidates are string types. It chooses
376 the "preferred type" for strings, <type>text</type>, for this query.
381 If a user defines a new type and defines an operator <quote>||</quote> to work
382 with it, then this query would no longer succeed as written. The parser would
383 now have candidate types from two categories, and could not decide which to use.
389 <title>Factorial</title>
392 This example illustrates an interesting result. Traditionally, the
393 factorial operator is defined for integers only. The <productname>Postgres</productname>
394 operator catalog has only one entry for factorial, taking an integer operand.
395 If given a non-integer numeric argument, <productname>Postgres</productname>
396 will try to convert that argument to an integer for evaluation of the
400 tgl=> select (4.3 !);
409 Of course, this leads to a mathematically suspect result,
410 since in principle the factorial of a non-integer is not defined.
411 However, the role of a database is not to teach mathematics, but
412 to be a tool for data manipulation. If a user chooses to take the
413 factorial of a floating point number, <productname>Postgres</productname>
423 <title>Functions</title>
426 <title>Function Evaluation</title>
428 <step performance="required">
430 Check for an exact match in the pg_proc system catalog.
432 <step performance="required">
434 Look for the best match.
437 <step performance="required">
439 Make a list of all functions of the same name with the same number of arguments.
441 <step performance="required">
443 If only one function is in the list, use it if the input types can be coerced,
444 and throw an error if the types cannot be coerced.
446 <step performance="required">
448 Keep all functions with the most explicit matches for types. Keep all if there
449 are no explicit matches and move to the next step.
450 If only one candidate remains, use it if the type can be coerced.
452 <step performance="required">
454 If any input arguments are "unknown", categorize the input candidate arguments as
455 boolean, numeric, string, geometric, or user-defined. If there is a mix of
456 categories, or more than one user-defined type, throw an error because
457 the correct choice cannot be deduced without more clues.
458 If only one category is present, then assign the "preferred type"
459 to the input column which had been previously "unknown".
461 <step performance="required">
463 Choose the candidate with the most exact type matches, and which matches
464 the "preferred type" for each column category from the previous step.
465 If there is still more than one candidate, or if there are none,
472 <title>Examples</title>
475 <title>Factorial Function</title>
478 There is only one factorial function defined in the pg_proc catalog.
479 So the following query automatically converts the <type>int2</type> argument
480 to <type>int4</type>:
483 tgl=> select int4fac(int2 '4');
490 and is actually transformed by the parser to
492 tgl=> select int4fac(int4(int2 '4'));
502 <title>Substring Function</title>
505 There are two <function>substr</function> functions declared in pg_proc. However,
506 only one takes two arguments, of types <type>text</type> and <type>int4</type>.
510 If called with a string constant of unspecified type, the type is matched up
511 directly with the only candidate function type:
513 tgl=> select substr('1234', 3);
522 If the string is declared to be of type <type>varchar</type>, as might be the case
523 if it comes from a table, then the parser will try to coerce it to become <type>text</type>:
525 tgl=> select substr(varchar '1234', 3);
531 which is transformed by the parser to become
533 tgl=> select substr(text(varchar '1234'), 3);
542 There are some heuristics in the parser to optimize the relationship between the
543 <type>char</type>, <type>varchar</type>, and <type>text</type> types.
544 For this case, <function>substr</function> is called directly with the <type>varchar</type> string
545 rather than inserting an explicit conversion call.
550 And, if the function is called with an <type>int4</type>, the parser will
551 try to convert that to <type>text</type>:
553 tgl=> select substr(1234, 3);
561 tgl=> select substr(text(1234), 3);
573 <title>Query Targets</title>
576 <title>Target Evaluation</title>
578 <step performance="required">
580 Check for an exact match with the target.
582 <step performance="required">
584 Try to coerce the expression directly to the target type if necessary.
587 <step performance="required">
589 If the target is a fixed-length type (e.g. <type>char</type> or <type>varchar</type>
590 declared with a length) then try to find a sizing function of the same name
591 as the type taking two arguments, the first the type name and the second an
598 <title>Examples</title>
601 <title><type>varchar</type> Storage</title>
604 For a target column declared as <type>varchar(4)</type> the following query
605 ensures that the target is sized correctly:
608 tgl=> CREATE TABLE vv (v varchar(4));
610 tgl=> INSERT INTO vv SELECT 'abc' || 'def';
612 tgl=> select * from vv;
624 <title>UNION Queries</title>
627 The UNION construct is somewhat different in that it must match up
628 possibly dissimilar types to become a single result set.
631 <title>UNION Evaluation</title>
633 <step performance="required">
635 Check for identical types for all results.
638 <step performance="required">
640 Coerce each result from the UNION clauses to match the type of the
641 first SELECT clause or the target column.
646 <title>Examples</title>
649 <title>Underspecified Types</title>
653 tgl=> SELECT text 'a' AS "Text" UNION SELECT 'b';
664 <title>Simple UNION</title>
668 tgl=> SELECT 1.2 AS Float8 UNION SELECT 1;
679 <title>Transposed UNION</title>
682 The types of the union are forced to match the types of
683 the first/top clause in the union:
686 tgl=> SELECT 1 AS "All integers"
687 tgl-> UNION SELECT '2.2'::float4
688 tgl-> UNION SELECT 3.3;
698 An alternate parser strategy could be to choose the "best" type of the bunch, but
699 this is more difficult because of the nice recursion technique used in the
700 parser. However, the "best" type is used when selecting <emphasis>into</emphasis>
704 tgl=> CREATE TABLE ff (f float);
708 tgl-> UNION SELECT '2.2'::float4
709 tgl-> UNION SELECT 3.3;
711 tgl=> SELECT f AS "Floating point" from ff;