1 <chapter Id="typeconv">
2 <title>Type Conversion</title>
4 <sect1 id="typeconv-intro">
5 <title>Introduction</title>
8 <acronym>SQL</acronym> queries can, intentionally or not, require
9 mixing of different data types in the same expression.
10 <productname>PostgreSQL</productname> has extensive facilities for
11 evaluating mixed-type expressions.
15 In many cases a user will not need
16 to understand the details of the type conversion mechanism.
17 However, the implicit conversions done by <productname>PostgreSQL</productname>
18 can affect the results of a query. When necessary, these results
19 can be tailored by a user or programmer
20 using <emphasis>explicit</emphasis> type coercion.
24 This chapter introduces the <productname>PostgreSQL</productname>
25 type conversion mechanisms and conventions.
26 Refer to the relevant sections in <xref linkend="datatype"> and <xref linkend="functions">
27 for more information on specific data types and allowed functions and
32 The <citetitle>Programmer's Guide</citetitle> has more details on the exact algorithms used for
33 implicit type conversion and coercion.
37 <sect1 id="typeconv-overview">
38 <title>Overview</title>
41 <acronym>SQL</acronym> is a strongly typed language. That is, every data item
42 has an associated data type which determines its behavior and allowed usage.
43 <productname>PostgreSQL</productname> has an extensible type system that is
44 much more general and flexible than other <acronym>RDBMS</acronym> implementations.
45 Hence, most type conversion behavior in <productname>PostgreSQL</productname>
46 should be governed by general rules rather than by ad-hoc heuristics, to allow
47 mixed-type expressions to be meaningful even with user-defined types.
51 The <productname>PostgreSQL</productname> scanner/parser decodes lexical
52 elements into only five fundamental categories: integers, floating-point numbers, strings,
53 names, and key words. Most extended types are first tokenized into
54 strings. The <acronym>SQL</acronym> language definition allows specifying type
55 names with strings, and this mechanism can be used in
56 <productname>PostgreSQL</productname> to start the parser down the correct
57 path. For example, the query
60 tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
67 has two literal constants, of type <type>text</type> and <type>point</type>.
68 If a type is not specified for a string literal, then the placeholder type
69 <firstterm>unknown</firstterm> is assigned initially, to be resolved in later
70 stages as described below.
74 There are four fundamental <acronym>SQL</acronym> constructs requiring
75 distinct type conversion rules in the <productname>PostgreSQL</productname>
86 <productname>PostgreSQL</productname> allows expressions with
87 prefix and postfix unary (one-argument) operators,
88 as well as binary (two-argument) operators.
98 Much of the <productname>PostgreSQL</productname> type system is built around a
99 rich set of functions. Function calls have one or more arguments which, for
100 any specific query, must be matched to the functions available in the system
101 catalog. Since <productname>PostgreSQL</productname> permits function
102 overloading, the function name alone does not uniquely identify the function
103 to be called; the parser must select the right function based on the data
104 types of the supplied arguments.
114 <acronym>SQL</acronym> <command>INSERT</command> and <command>UPDATE</command> statements place the results of
115 expressions into a table. The expressions in the query must be matched up
116 with, and perhaps converted to, the types of the target columns.
122 <literal>UNION</literal> and <literal>CASE</literal> constructs
126 Since all select results from a unionized <literal>SELECT</literal> statement must appear in a single
127 set of columns, the types of the results
128 of each <literal>SELECT</> clause must be matched up and converted to a uniform set.
129 Similarly, the result expressions of a <literal>CASE</> construct must be coerced to
130 a common type so that the <literal>CASE</> expression as a whole has a known output type.
137 Many of the general type conversion rules use simple conventions built on
138 the <productname>PostgreSQL</productname> function and operator system tables.
139 There are some heuristics included in the conversion rules to better support
140 conventions for the <acronym>SQL</acronym> standard native types such as
141 <type>smallint</type>, <type>integer</type>, and <type>real</type>.
145 The <productname>PostgreSQL</productname> parser uses the convention that all
146 type conversion functions take a single argument of the source type and are
147 named with the same name as the target type. Any function meeting these
148 criteria is considered to be a valid conversion function, and may be used
149 by the parser as such. This simple assumption gives the parser the power
150 to explore type conversion possibilities without hardcoding, allowing
151 extended user-defined types to use these same features transparently.
155 An additional heuristic is provided in the parser to allow better guesses
156 at proper behavior for <acronym>SQL</acronym> standard types. There are
157 several basic <firstterm>type categories</firstterm> defined: <type>boolean</type>,
158 <type>numeric</type>, <type>string</type>, <type>bitstring</type>, <type>datetime</type>, <type>timespan</type>, <type>geometric</type>, <type>network</type>,
159 and user-defined. Each category, with the exception of user-defined, has
160 a <firstterm>preferred type</firstterm> which is preferentially selected
161 when there is ambiguity.
162 In the user-defined category, each type is its own preferred type.
163 Ambiguous expressions (those with multiple candidate parsing solutions)
164 can often be resolved when there are multiple possible built-in types, but
165 they will raise an error when there are multiple choices for user-defined
170 All type conversion rules are designed with several principles in mind:
172 <itemizedlist mark="bullet" spacing="compact">
175 Implicit conversions should never have surprising or unpredictable outcomes.
181 User-defined types, of which the parser has no a-priori knowledge, should be
182 <quote>higher</quote> in the type hierarchy. In mixed-type expressions, native types shall always
183 be converted to a user-defined type (of course, only if conversion is necessary).
189 User-defined types are not related. Currently, <productname>PostgreSQL</productname>
190 does not have information available to it on relationships between types, other than
191 hardcoded heuristics for built-in types and implicit relationships based on available functions
198 There should be no extra overhead from the parser or executor
199 if a query does not need implicit type conversion.
200 That is, if a query is well formulated and the types already match up, then the query should proceed
201 without spending extra time in the parser and without introducing unnecessary implicit conversion
202 functions into the query.
206 Additionally, if a query usually requires an implicit conversion for a function, and
207 if then the user defines an explicit function with the correct argument types, the parser
208 should use this new function and will no longer do the implicit conversion using the old function.
216 <sect1 id="typeconv-oper">
217 <title>Operators</title>
220 The operand types of an operator invocation are resolved following
221 the procedure below. Note that this procedure is indirectly affected
222 by the precedence of the involved operators. See <xref
223 linkend="sql-precedence"> for more information.
227 <title>Operand Type Resolution</title>
229 <step performance="required">
231 Check for an exact match in the <classname>pg_operator</classname> system catalog.
235 <step performance="optional">
237 If one argument of a binary operator is <type>unknown</type> type,
238 then assume it is the same type as the other argument for this check.
239 Other cases involving <type>unknown</type> will never find a match at
246 <step performance="required">
248 Look for the best match.
251 <step performance="required">
253 Make a list of all operators of the same name for which the input types
254 match or can be coerced to match. (<type>unknown</type> literals are
255 assumed to be coercible to anything for this purpose.) If there is only
256 one, use it; else continue to the next step.
259 <step performance="required">
261 Run through all candidates and keep those with the most exact matches
262 on input types. Keep all candidates if none have any exact matches.
263 If only one candidate remains, use it; else continue to the next step.
266 <step performance="required">
268 Run through all candidates and keep those with the most exact or
269 binary-compatible matches on input types. Keep all candidates if none have
270 any exact or binary-compatible matches.
271 If only one candidate remains, use it; else continue to the next step.
274 <step performance="required">
276 Run through all candidates and keep those that accept preferred types at
277 the most positions where type coercion will be required.
278 Keep all candidates if none accept preferred types.
279 If only one candidate remains, use it; else continue to the next step.
282 <step performance="required">
284 If any input arguments are <quote>unknown</quote>, check the type
285 categories accepted at those argument positions by the remaining
286 candidates. At each position, select the "string" category if any
287 candidate accepts that category (this bias towards string is appropriate
288 since an unknown-type literal does look like a string). Otherwise, if
289 all the remaining candidates accept the same type category, select that
290 category; otherwise fail because the correct choice cannot be deduced
291 without more clues. Also note whether any of the candidates accept a
292 preferred data type within the selected category. Now discard operator
293 candidates that do not accept the selected type category; furthermore,
294 if any candidate accepts a preferred type at a given argument position,
295 discard candidates that accept non-preferred types for that argument.
298 <step performance="required">
300 If only one candidate remains, use it. If no candidate or more than one
309 <bridgehead renderas="sect2">Examples</bridgehead>
312 <title>Exponentiation Operator Type Resolution</title>
315 There is only one exponentiation
316 operator defined in the catalog, and it takes arguments of type
317 <type>double precision</type>.
318 The scanner assigns an initial type of <type>integer</type> to both arguments
319 of this query expression:
321 tgl=> SELECT 2 ^ 3 AS "Exp";
328 So the parser does a type conversion on both operands and the query
332 tgl=> SELECT CAST(2 AS double precision) ^ CAST(3 AS double precision) AS "Exp";
342 tgl=> SELECT 2.0 ^ 3.0 AS "Exp";
351 This last form has the least overhead, since no functions are called to do
352 implicit type conversion. This is not an issue for small queries, but may
353 have an impact on the performance of queries involving large tables.
360 <title>String Concatenation Operator Type Resolution</title>
363 A string-like syntax is used for working with string types as well as for
364 working with complex extended types.
365 Strings with unspecified type are matched with likely operator candidates.
369 An example with one unspecified argument:
371 tgl=> SELECT text 'abc' || 'def' AS "Text and Unknown";
380 In this case the parser looks to see if there is an operator taking <type>text</type>
381 for both arguments. Since there is, it assumes that the second argument should
382 be interpreted as of type <type>text</type>.
386 Concatenation on unspecified types:
388 tgl=> SELECT 'abc' || 'def' AS "Unspecified";
397 In this case there is no initial hint for which type to use, since no types
398 are specified in the query. So, the parser looks for all candidate operators
399 and finds that there are candidates accepting both string-category and
400 bit-string-category inputs. Since string category is preferred when available,
401 that category is selected, and then the
402 <quote>preferred type</quote> for strings, <type>text</type>, is used as the specific
403 type to resolve the unknown literals to.
408 <title>Absolute-Value and Factorial Operator Type Resolution</title>
411 The <productname>PostgreSQL</productname> operator catalog has several
412 entries for the prefix operator <literal>@</>, all of which implement
413 absolute-value operations for various numeric datatypes. One of these
414 entries is for type <type>float8</type>, which is the preferred type in
415 the numeric category. Therefore, <productname>PostgreSQL</productname>
416 will use that entry when faced with a non-numeric input:
418 tgl=> select @ text '-4.5' as "abs";
424 Here the system has performed an implicit text-to-float8 conversion
425 before applying the chosen operator. We can verify that float8 and
426 not some other type was used:
428 tgl=> select @ text '-4.5e500' as "abs";
429 ERROR: Input '-4.5e500' is out of range for float8
434 On the other hand, the postfix operator <literal>!</> (factorial)
435 is defined only for integer datatypes, not for float8. So, if we
436 try a similar case with <literal>!</>, we get:
438 tgl=> select text '44' ! as "factorial";
439 ERROR: Unable to identify a postfix operator '!' for type 'text'
440 You may need to add parentheses or an explicit cast
442 This happens because the system can't decide which of the several
443 possible <literal>!</> operators should be preferred. We can help
444 it out with an explicit cast:
446 tgl=> select cast(text '44' as int8) ! as "factorial";
448 ---------------------
457 <sect1 id="typeconv-func">
458 <title>Functions</title>
461 The argument types of function calls are resolved according to the
466 <title>Function Argument Type Resolution</title>
468 <step performance="required">
470 Check for an exact match in the <classname>pg_proc</classname> system catalog.
471 (Cases involving <type>unknown</type> will never find a match at
474 <step performance="required">
476 If no exact match appears in the catalog, see whether the function call appears
477 to be a trivial type coercion request. This happens if the function call
478 has just one argument and the function name is the same as the (internal)
479 name of some data type. Furthermore, the function argument must be either
480 an unknown-type literal or a type that is binary-compatible with the named
481 data type. When these conditions are met, the function argument is coerced
482 to the named data type without any explicit function call.
485 <step performance="required">
487 Look for the best match.
490 <step performance="required">
492 Make a list of all functions of the same name with the same number of
493 arguments for which the input types
494 match or can be coerced to match. (<type>unknown</type> literals are
495 assumed to be coercible to anything for this purpose.) If there is only
496 one, use it; else continue to the next step.
499 <step performance="required">
501 Run through all candidates and keep those with the most exact matches
502 on input types. Keep all candidates if none have any exact matches.
503 If only one candidate remains, use it; else continue to the next step.
506 <step performance="required">
508 Run through all candidates and keep those with the most exact or
509 binary-compatible matches on input types. Keep all candidates if none have
510 any exact or binary-compatible matches.
511 If only one candidate remains, use it; else continue to the next step.
514 <step performance="required">
516 Run through all candidates and keep those that accept preferred types at
517 the most positions where type coercion will be required.
518 Keep all candidates if none accept preferred types.
519 If only one candidate remains, use it; else continue to the next step.
522 <step performance="required">
524 If any input arguments are <type>unknown</type>, check the type categories accepted
525 at those argument positions by the remaining candidates. At each position,
526 select the <type>string</type> category if any candidate accepts that category
527 (this bias towards string
528 is appropriate since an unknown-type literal does look like a string).
529 Otherwise, if all the remaining candidates accept the same type category,
530 select that category; otherwise fail because
531 the correct choice cannot be deduced without more clues. Also note whether
532 any of the candidates accept a preferred data type within the selected category.
533 Now discard candidates that do not accept the selected type category;
534 furthermore, if any candidate accepts a preferred type at a given argument
535 position, discard candidates that accept non-preferred types for that
539 <step performance="required">
541 If only one candidate remains, use it. If no candidate or more than one
550 <bridgehead renderas="sect2">Examples</bridgehead>
553 <title>Factorial Function Argument Type Resolution</title>
556 There is only one <function>int4fac</function> function defined in the
557 <classname>pg_proc</classname> catalog.
558 So the following query automatically converts the <type>int2</type> argument
559 to <type>int4</type>:
562 tgl=> SELECT int4fac(int2 '4');
569 and is actually transformed by the parser to
571 tgl=> SELECT int4fac(int4(int2 '4'));
581 <title>Substring Function Type Resolution</title>
584 There are two <function>substr</function> functions declared in <classname>pg_proc</classname>. However,
585 only one takes two arguments, of types <type>text</type> and <type>int4</type>.
589 If called with a string constant of unspecified type, the type is matched up
590 directly with the only candidate function type:
592 tgl=> SELECT substr('1234', 3);
601 If the string is declared to be of type <type>varchar</type>, as might be the case
602 if it comes from a table, then the parser will try to coerce it to become <type>text</type>:
604 tgl=> SELECT substr(varchar '1234', 3);
610 which is transformed by the parser to become
612 tgl=> SELECT substr(text(varchar '1234'), 3);
622 Actually, the parser is aware that <type>text</type> and <type>varchar</type>
623 are <firstterm>binary-compatible</>, meaning that one can be passed to a function that
624 accepts the other without doing any physical conversion. Therefore, no
625 explicit type conversion call is really inserted in this case.
631 And, if the function is called with an <type>int4</type>, the parser will
632 try to convert that to <type>text</type>:
634 tgl=> SELECT substr(1234, 3);
640 which actually executes as
642 tgl=> SELECT substr(text(1234), 3);
648 This succeeds because there is a conversion function text(int4) in the
655 <sect1 id="typeconv-query">
656 <title>Query Targets</title>
659 Values to be inserted into a table are coerced to the destination
660 column's datatype according to the
665 <title>Query Target Type Resolution</title>
667 <step performance="required">
669 Check for an exact match with the target.
671 <step performance="required">
673 Otherwise, try to coerce the expression to the target type. This will succeed
674 if the two types are known binary-compatible, or if there is a conversion
675 function. If the expression is an unknown-type literal, the contents of
676 the literal string will be fed to the input conversion routine for the target
680 <step performance="required">
682 If the target is a fixed-length type (e.g. <type>char</type> or <type>varchar</type>
683 declared with a length) then try to find a sizing function for the target
684 type. A sizing function is a function of the same name as the type,
685 taking two arguments of which the first is that type and the second is an
686 integer, and returning the same type. If one is found, it is applied,
687 passing the column's declared length as the second parameter.
693 <title><type>character</type> Storage Type Conversion</title>
696 For a target column declared as <type>character(20)</type> the following query
697 ensures that the target is sized correctly:
700 tgl=> CREATE TABLE vv (v character(20));
702 tgl=> INSERT INTO vv SELECT 'abc' || 'def';
704 tgl=> SELECT v, length(v) FROM vv;
706 ----------------------+--------
711 What has really happened here is that the two unknown literals are resolved
712 to <type>text</type> by default, allowing the <literal>||</literal> operator
713 to be resolved as <type>text</type> concatenation. Then the <type>text</type>
714 result of the operator is coerced to <type>bpchar</type> (<quote>blank-padded
715 char</>, the internal name of the character datatype) to match the target
716 column type. (Since the parser knows that <type>text</type> and
717 <type>bpchar</type> are binary-compatible, this coercion is implicit and does
718 not insert any real function call.) Finally, the sizing function
719 <literal>bpchar(bpchar, integer)</literal> is found in the system catalogs
720 and applied to the operator's result and the stored column length. This
721 type-specific function performs the required length check and addition of
727 <sect1 id="typeconv-union-case">
728 <title><literal>UNION</> and <literal>CASE</> Constructs</title>
731 SQL <literal>UNION</> constructs must match up possibly dissimilar types to
732 become a single result set. The resolution algorithm is applied separately
733 to each output column of a union query. The <literal>INTERSECT</> and
734 <literal>EXCEPT</> constructs resolve dissimilar types in the same way as
736 A <literal>CASE</> construct also uses the identical algorithm to match up its
737 component expressions and select a result datatype.
740 <title><literal>UNION</> and <literal>CASE</> Type Resolution</title>
742 <step performance="required">
744 If all inputs are of type <type>unknown</type>, resolve as type
745 <type>text</type> (the preferred type for string category).
746 Otherwise, ignore the <type>unknown</type> inputs while choosing the type.
749 <step performance="required">
751 If the non-unknown inputs are not all of the same type category, fail.
754 <step performance="required">
756 If one or more non-unknown inputs are of a preferred type in that category,
757 resolve as that type.
760 <step performance="required">
762 Otherwise, resolve as the type of the first non-unknown input.
765 <step performance="required">
767 Coerce all inputs to the selected type.
771 <bridgehead renderas="sect2">Examples</bridgehead>
774 <title>Underspecified Types in a Union</title>
778 tgl=> SELECT text 'a' AS "Text" UNION SELECT 'b';
785 Here, the unknown-type literal <literal>'b'</literal> will be resolved as type text.
790 <title>Type Conversion in a Simple Union</title>
794 tgl=> SELECT 1.2 AS "Double" UNION SELECT 1;
801 The literal <literal>1.2</> is of type <type>double precision</>,
802 the preferred type in the numeric category, so that type is used.
807 <title>Type Conversion in a Transposed Union</title>
810 Here the output type of the union is forced to match the type of
811 the first clause in the union:
814 tgl=> SELECT 1 AS "All integers"
815 tgl-> UNION SELECT CAST('2.2' AS REAL);
824 Since <type>REAL</type> is not a preferred type, the parser sees no reason
825 to select it over <type>INTEGER</type> (which is what the 1 is), and instead
826 falls back on the use-the-first-alternative rule.
827 This example demonstrates that the preferred-type mechanism doesn't encode
828 as much information as we'd like. Future versions of
829 <productname>PostgreSQL</productname> may support a more general notion of
837 <!-- Keep this comment at the end of the file
842 sgml-minimize-attributes:nil
843 sgml-always-quote-attributes:t
846 sgml-parent-document:nil
847 sgml-default-dtd-file:"./reference.ced"
848 sgml-exposed-tags:nil
849 sgml-local-catalogs:("/usr/lib/sgml/catalog")
850 sgml-local-ecat-files:nil