granicus.if.org Git - postgresql/blob - doc/src/sgml/typeconv.sgml

   1 <chapter Id="typeconv">
   2 <title>Type Conversion</title>
   3
   4 <para>
   5 <acronym>SQL</acronym> queries can, intentionally or not, require
   6 mixing of different data types in the same expression.
   7 <productname>Postgres</productname> has extensive facilities for
   8 evaluating mixed-type expressions.
   9 </para>
  10
  11 <para>
  12 In many cases a user will not need
  13 to understand the details of the type conversion mechanism.
  14 However, the implicit conversions done by <productname>Postgres</productname>
  15 can affect the apparent results of a query, and these results
  16 can be tailored by a user or programmer
  17 using <emphasis>explicit</emphasis> type coercion.
  18 </para>
  19
  20 <para>
  21 This chapter introduces the <productname>Postgres</productname>
  22  type conversion mechanisms and conventions.
  23 Refer to the relevant sections in the User's Guide and Programmer's Guide
  24 for more information on specific data types and allowed functions and operators.
  25 </para>
  26
  27 <para>
  28 The Programmer's Guide has more details on the exact algorithms used for
  29 implicit type conversion and coercion.
  30 </para>
  31
  32 <sect1>
  33 <title>Overview</title>
  34
  35 <para>
  36 <acronym>SQL</acronym> is a strongly typed language. That is, every data item
  37 has an associated data type which determines its behavior and allowed usage.
  38 <productname>Postgres</productname> has an extensible type system which is
  39 much more general and flexible than other <acronym>RDBMS</acronym> implementations.
  40 Hence, most type conversion behavior in <productname>Postgres</productname>
  41 should be governed by general rules rather than by ad-hoc heuristics to allow
  42 mixed-type expressions to be meaningful, even with user-defined types.
  43 </para>
  44
  45 <para>
  46 The <productname>Postgres</productname> scanner/parser decodes lexical elements
  47 into only five fundamental categories: integers, floats, strings, names, and keywords.
  48 Most extended types are first tokenized into strings. The <acronym>SQL</acronym>
  49 language definition allows specifying type names with strings, and this mechanism
  50 is used by <productname>Postgres</productname>
  51 to start the parser down the correct path. For example, the query
  52
  53 <programlisting>
  54 tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
  55 Label |Value
  56 ------+-----
  57 Origin|(0,0)
  58 (1 row)
  59 </programlisting>
  60
  61 has two strings, of type <type>text</type> and <type>point</type>.
  62 If a type is not specified, then the placeholder type <type>unknown</type>
  63 is assigned initially, to be resolved in later stages as described below.
  64 </para>
  65
  66 <para>
  67 There are four fundamental <acronym>SQL</acronym> constructs requiring
  68 distinct type conversion rules in the <productname>Postgres</productname>
  69 parser:
  70 </para>
  71
  72 <variablelist>
  73 <varlistentry>
  74 <term>
  75 Operators
  76 </term>
  77 <listitem>
  78 <para>
  79 <productname>Postgres</productname> allows expressions with
  80 left- and right-unary (one argument) operators,
  81 as well as binary (two argument) operators.
  82 </para>
  83 </listitem>
  84 </varlistentry>
  85 <varlistentry>
  86 <term>
  87 Function calls
  88 </term>
  89 <listitem>
  90 <para>
  91 Much of the <productname>Postgres</productname> type system is built around a rich set of
  92 functions. Function calls have one or more arguments which, for any specific query,
  93 must be matched to the functions available in the system catalog.
  94 </para>
  95 </listitem>
  96 </varlistentry>
  97 <varlistentry>
  98 <term>
  99 Query targets
 100 </term>
 101 <listitem>
 102 <para>
 103 <acronym>SQL</acronym> INSERT statements place the results of query into a table. The expressions
 104 in the query must be matched up with, and perhaps converted to, the target columns of the insert.
 105 </para>
 106 </listitem>
 107 </varlistentry>
 108 <varlistentry>
 109 <term>
 110 UNION queries
 111 </term>
 112 <listitem>
 113 <para>
 114 Since all select results from a UNION SELECT statement must appear in a single set of columns, the types
 115 of each SELECT clause must be matched up and converted to a uniform set.
 116 </para>
 117 </listitem>
 118 </varlistentry>
 119 </variablelist>
 120
 121 <para>
 122 Many of the general type conversion rules use simple conventions built on
 123 the <productname>Postgres</productname> function and operator system tables.
 124 There are some heuristics included in the conversion rules to better support
 125 conventions for the <acronym>SQL92</acronym> standard native types such as
 126 <type>smallint</type>, <type>integer</type>, and <type>float</type>.
 127 </para>
 128
 129 <para>
 130 The <productname>Postgres</productname> parser uses the convention that all
 131 type conversion functions take a single argument of the source type and are
 132 named with the same name as the target type. Any function meeting this
 133 criteria is considered to be a valid conversion function, and may be used
 134 by the parser as such. This simple assumption gives the parser the power
 135 to explore type conversion possibilities without hardcoding, allowing
 136 extended user-defined types to use these same features transparently.
 137 </para>
 138
 139 <para>
 140 An additional heuristic is provided in the parser to allow better guesses
 141 at proper behavior for <acronym>SQL</acronym> standard types. There are
 142 five categories of types defined: boolean, string, numeric, geometric,
 143 and user-defined. Each category, with the exception of user-defined, has
 144 a "preferred type" which is used to resolve ambiguities in candidates.
 145 Each "user-defined" type is its own "preferred type", so ambiguous
 146 expressions (those with multiple candidate parsing solutions)
 147 with only one user-defined type can resolve to a single best choice, while those with
 148 multiple user-defined types will remain ambiguous and throw an error.
 149 </para>
 150
 151 <para>
 152 Ambiguous expressions which have candidate solutions within only one type category are
 153 likely to resolve, while ambiguous expressions with candidates spanning multiple
 154 categories are likely to throw an error and ask for clarification from the user.
 155 </para>
 156
 157 <sect2>
 158 <title>Guidelines</title>
 159
 160 <para>
 161 All type conversion rules are designed with several principles in mind:
 162
 163 <itemizedlist mark="bullet" spacing="compact">
 164 <listitem>
 165 <para>
 166 Implicit conversions should never have surprising or unpredictable outcomes.
 167 </para>
 168 </listitem>
 169
 170 <listitem>
 171 <para>
 172 User-defined types, of which the parser has no a-priori knowledge, should be
 173 "higher" in the type hierarchy. In mixed-type expressions, native types shall always
 174 be converted to a user-defined type (of course, only if conversion is necessary).
 175 </para>
 176 </listitem>
 177
 178 <listitem>
 179 <para>
 180 User-defined types are not related. Currently, <productname>Postgres</productname>
 181 does not have information available to it on relationships between types, other than
 182 hardcoded heuristics for built-in types and implicit relationships based on available functions
 183 in the catalog.
 184 </para>
 185 </listitem>
 186
 187 <listitem>
 188 <para>
 189 There should be no extra overhead from the parser or executor
 190 if a query does not need implicit type conversion.
 191 That is, if a query is well formulated and the types already match up, then the query should proceed
 192 without spending extra time in the parser and without introducing unnecessary implicit conversion
 193 functions into the query.
 194 </para>
 195
 196 <para>
 197 Additionally, if a query usually requires an implicit conversion for a function, and
 198 if then the user defines an explicit function with the correct argument types, the parser
 199 should use this new function and will no longer do the implicit conversion using the old function.
 200 </para>
 201 </listitem>
 202 </itemizedlist>
 203 </para>
 204 </sect2>
 205 </sect1>
 206
 207 <sect1>
 208 <title>Operators</title>
 209
 210 <sect2>
 211 <title>Conversion Procedure</title>
 212
 213 <procedure>
 214 <title>Operator Evaluation</title>
 215
 216
 217 <step performance="required">
 218 <para>
 219 Check for an exact match in the pg_operator system catalog.
 220 </para>
 221
 222 <substeps>
 223 <step performance="optional">
 224 <para>
 225 If one argument of a binary operator is <type>unknown</type>,
 226 then assume it is the same type as the other argument.
 227 </para>
 228 </step>
 229 <step performance="required">
 230 <para>
 231 Reverse the arguments, and look for an exact match with an operator which
 232 points to itself as being commutative.
 233 If found, then reverse the arguments in the parse tree and use this operator.
 234 </para>
 235 </step>
 236 </substeps>
 237 </step>
 238
 239 <step performance="required">
 240 <para>
 241 Look for the best match.
 242 </para>
 243 <substeps>
 244 <step performance="optional">
 245 <para>
 246 Make a list of all operators of the same name.
 247 </para>
 248 </step>
 249 <step performance="required">
 250 <para>
 251 If only one operator is in the list, use it if the input type can be coerced,
 252 and throw an error if the type cannot be coerced.
 253 </para>
 254 </step>
 255 <step performance="required">
 256 <para>
 257 Keep all operators with the most explicit matches for types. Keep all if there
 258 are no explicit matches and move to the next step.
 259 If only one candidate remains, use it if the type can be coerced.
 260 </para>
 261 </step>
 262 <step performance="required">
 263 <para>
 264 If any input arguments are "unknown", categorize the input candidates as
 265 boolean, numeric, string, geometric, or user-defined. If there is a mix of
 266 categories, or more than one user-defined type, throw an error because
 267 the correct choice cannot be deduced without more clues.
 268 If only one category is present, then assign the "preferred type"
 269 to the input column which had been previously "unknown".
 270 </para>
 271 </step>
 272 <step performance="required">
 273 <para>
 274 Choose the candidate with the most exact type matches, and which matches
 275 the "preferred type" for each column category from the previous step.
 276 If there is still more than one candidate, or if there are none,
 277 then throw an error.
 278 </para>
 279 </step>
 280 </substeps>
 281 </step>
 282 </procedure>
 283 </sect2>
 284
 285 <sect2>
 286 <title>Examples</title>
 287
 288 <sect3>
 289 <title>Exponentiation Operator</title>
 290
 291 <para>
 292 There is only one exponentiation
 293 operator defined in the catalog, and it takes <type>float8</type> arguments.
 294 The scanner assigns an initial type of <type>int4</type> to both arguments
 295 of this query expression:
 296 <programlisting>
 297 tgl=> select 2 ^ 3 AS "Exp";
 298 Exp
 299 ---
 300   8
 301 (1 row)
 302 </programlisting>
 303
 304 So the parser does a type conversion on both operands and the query
 305 is equivalent to
 306
 307 <programlisting>
 308 tgl=> select float8(2) ^ float8(3) AS "Exp";
 309 Exp
 310 ---
 311   8
 312 (1 row)
 313 </programlisting>
 314
 315 or
 316
 317 <programlisting>
 318 tgl=> select 2.0 ^ 3.0 AS "Exp";
 319 Exp
 320 ---
 321   8
 322 (1 row)
 323 </programlisting>
 324
 325 <note>
 326 <para>
 327 This last form has the least overhead, since no functions are called to do
 328 implicit type conversion. This is not an issue for small queries, but may
 329 have an impact on the performance of queries involving large tables.
 330 </para>
 331 </note>
 332 </para>
 333 </sect3>
 334
 335 <sect3>
 336 <title>String Concatenation</title>
 337
 338 <para>
 339 A string-like syntax is used for working with string types as well as for
 340 working with complex extended types.
 341 Strings with unspecified type are matched with likely operator candidates.
 342 </para>
 343
 344 <para>
 345 One unspecified argument:
 346 <programlisting>
 347 tgl=> SELECT text 'abc' || 'def' AS "Text and Unknown";
 348 Text and Unknown
 349 ----------------
 350 abcdef
 351 (1 row)
 352 </programlisting>
 353 </para>
 354
 355 <para>
 356 In this case the parser looks to see if there is an operator taking <type>text</type>
 357 for both arguments. Since there is, it assumes that the second argument should
 358 be interpreted as of type <type>text</type>.
 359 </para>
 360
 361 <para>
 362 Concatenation on unspecified types:
 363 <programlisting>
 364 tgl=> SELECT 'abc' || 'def' AS "Unspecified";
 365 Unspecified
 366 -----------
 367 abcdef
 368 (1 row)
 369 </programlisting>
 370 </para>
 371
 372 <para>
 373 In this case there is no initial hint for which type to use, since no types
 374 are specified in the query. So, the parser looks for all candidate operators
 375 and finds that all arguments for all the candidates are string types. It chooses
 376 the "preferred type" for strings, <type>text</type>, for this query.
 377 </para>
 378
 379 <note>
 380 <para>
 381 If a user defines a new type and defines an operator <quote>||</quote> to work
 382 with it, then this query would no longer succeed as written. The parser would
 383 now have candidate types from two categories, and could not decide which to use.
 384 </para>
 385 </note>
 386 </sect3>
 387
 388 <sect3>
 389 <title>Factorial</title>
 390
 391 <para>
 392 This example illustrates an interesting result. Traditionally, the
 393 factorial operator is defined for integers only. The <productname>Postgres</productname>
 394 operator catalog has only one entry for factorial, taking an integer operand.
 395 If given a non-integer numeric argument, <productname>Postgres</productname>
 396 will try to convert that argument to an integer for evaluation of the
 397 factorial.
 398
 399 <programlisting>
 400 tgl=> select (4.3 !);
 401 ?column?
 402 --------
 403       24
 404 (1 row)
 405 </programlisting>
 406
 407 <note>
 408 <para>
 409 Of course, this leads to a mathematically suspect result,
 410 since in principle the factorial of a non-integer is not defined.
 411 However, the role of a database is not to teach mathematics, but
 412 to be a tool for data manipulation. If a user chooses to take the
 413 factorial of a floating point number, <productname>Postgres</productname>
 414 will try to oblige.
 415 </para>
 416 </note>
 417 </para>
 418 </sect3>
 419 </sect2>
 420 </sect1>
 421
 422 <sect1>
 423 <title>Functions</title>
 424
 425 <procedure>
 426 <title>Function Evaluation</title>
 427
 428 <step performance="required">
 429 <para>
 430 Check for an exact match in the pg_proc system catalog.
 431 </para></step>
 432 <step performance="required">
 433 <para>
 434 Look for the best match.
 435 </para>
 436 <substeps>
 437 <step performance="required">
 438 <para>
 439 Make a list of all functions of the same name with the same number of arguments.
 440 </para></step>
 441 <step performance="required">
 442 <para>
 443 If only one function is in the list, use it if the input types can be coerced,
 444 and throw an error if the types cannot be coerced.
 445 </para></step>
 446 <step performance="required">
 447 <para>
 448 Keep all functions with the most explicit matches for types. Keep all if there
 449 are no explicit matches and move to the next step.
 450 If only one candidate remains, use it if the type can be coerced.
 451 </para></step>
 452 <step performance="required">
 453 <para>
 454 If any input arguments are "unknown", categorize the input candidate arguments as
 455 boolean, numeric, string, geometric, or user-defined. If there is a mix of
 456 categories, or more than one user-defined type, throw an error because
 457 the correct choice cannot be deduced without more clues.
 458 If only one category is present, then assign the "preferred type"
 459 to the input column which had been previously "unknown".
 460 </para></step>
 461 <step performance="required">
 462 <para>
 463 Choose the candidate with the most exact type matches, and which matches
 464 the "preferred type" for each column category from the previous step.
 465 If there is still more than one candidate, or if there are none,
 466 then throw an error.
 467 </para></step>
 468 </substeps>
 469 </step>
 470 </procedure>
 471 <sect2>
 472 <title>Examples</title>
 473
 474 <sect3>
 475 <title>Factorial Function</title>
 476
 477 <para>
 478 There is only one factorial function defined in the pg_proc catalog.
 479 So the following query automatically converts the <type>int2</type> argument
 480 to <type>int4</type>:
 481
 482 <programlisting>
 483 tgl=> select int4fac(int2 '4');
 484 int4fac
 485 -------
 486      24
 487 (1 row)
 488 </programlisting>
 489
 490 and is actually transformed by the parser to
 491 <programlisting>
 492 tgl=> select int4fac(int4(int2 '4'));
 493 int4fac
 494 -------
 495      24
 496 (1 row)
 497 </programlisting>
 498 </para>
 499 </sect3>
 500
 501 <sect3>
 502 <title>Substring Function</title>
 503
 504 <para>
 505 There are two <function>substr</function> functions declared in pg_proc. However,
 506 only one takes two arguments, of types <type>text</type> and <type>int4</type>.
 507 </para>
 508
 509 <para>
 510 If called with a string constant of unspecified type, the type is matched up
 511 directly with the only candidate function type:
 512 <programlisting>
 513 tgl=> select substr('1234', 3);
 514 substr
 515 ------
 516     34
 517 (1 row)
 518 </programlisting>
 519 </para>
 520
 521 <para>
 522 If the string is declared to be of type <type>varchar</type>, as might be the case
 523 if it comes from a table, then the parser will try to coerce it to become <type>text</type>:
 524 <programlisting>
 525 tgl=> select substr(varchar '1234', 3);
 526 substr
 527 ------
 528     34
 529 (1 row)
 530 </programlisting>
 531 which is transformed by the parser to become
 532 <programlisting>
 533 tgl=> select substr(text(varchar '1234'), 3);
 534 substr
 535 ------
 536     34
 537 (1 row)
 538 </programlisting>
 539 </para>
 540 <note>
 541 <para>
 542 There are some heuristics in the parser to optimize the relationship between the
 543 <type>char</type>, <type>varchar</type>, and <type>text</type> types.
 544 For this case, <function>substr</function> is called directly with the <type>varchar</type> string
 545 rather than inserting an explicit conversion call.
 546 </para>
 547 </note>
 548
 549 <para>
 550 And, if the function is called with an <type>int4</type>, the parser will
 551 try to convert that to <type>text</type>:
 552 <programlisting>
 553 tgl=> select substr(1234, 3);
 554 substr
 555 ------
 556     34
 557 (1 row)
 558 </programlisting>
 559 actually executes as
 560 <programlisting>
 561 tgl=> select substr(text(1234), 3);
 562 substr
 563 ------
 564     34
 565 (1 row)
 566 </programlisting>
 567 </para>
 568 </sect3>
 569 </sect2>
 570 </sect1>
 571
 572 <sect1>
 573 <title>Query Targets</title>
 574
 575 <procedure>
 576 <title>Target Evaluation</title>
 577
 578 <step performance="required">
 579 <para>
 580 Check for an exact match with the target.
 581 </para></step>
 582 <step performance="required">
 583 <para>
 584 Try to coerce the expression directly to the target type if necessary.
 585 </para></step>
 586
 587 <step performance="required">
 588 <para>
 589 If the target is a fixed-length type (e.g. <type>char</type> or <type>varchar</type>
 590 declared with a length) then try to find a sizing function of the same name
 591 as the type taking two arguments, the first the type name and the second an
 592 integer length.
 593 </para></step>
 594
 595 </procedure>
 596
 597 <sect2>
 598 <title>Examples</title>
 599
 600 <sect3>
 601 <title><type>varchar</type> Storage</title>
 602
 603 <para>
 604 For a target column declared as <type>varchar(4)</type> the following query
 605 ensures that the target is sized correctly:
 606
 607 <programlisting>
 608 tgl=> CREATE TABLE vv (v varchar(4));
 609 CREATE
 610 tgl=> INSERT INTO vv SELECT 'abc' || 'def';
 611 INSERT 392905 1
 612 tgl=> select * from vv;
 613 v
 614 ----
 615 abcd
 616 (1 row)
 617 </programlisting>
 618 </para>
 619 </sect3>
 620 </sect2>
 621 </sect1>
 622
 623 <sect1>
 624 <title>UNION Queries</title>
 625
 626 <para>
 627 The UNION construct is somewhat different in that it must match up
 628 possibly dissimilar types to become a single result set.
 629 </para>
 630 <procedure>
 631 <title>UNION Evaluation</title>
 632
 633 <step performance="required">
 634 <para>
 635 Check for identical types for all results.
 636 </para></step>
 637
 638 <step performance="required">
 639 <para>
 640 Coerce each result from the UNION clauses to match the type of the
 641 first SELECT clause or the target column.
 642 </para></step>
 643 </procedure>
 644
 645 <sect2>
 646 <title>Examples</title>
 647
 648 <sect3>
 649 <title>Underspecified Types</title>
 650
 651 <para>
 652 <programlisting>
 653 tgl=> SELECT text 'a' AS "Text" UNION SELECT 'b';
 654 Text
 655 ----
 656 a
 657 b
 658 (2 rows)
 659 </programlisting>
 660 </para>
 661 </sect3>
 662
 663 <sect3>
 664 <title>Simple UNION</title>
 665
 666 <para>
 667 <programlisting>
 668 tgl=> SELECT 1.2 AS Float8 UNION SELECT 1;
 669 Float8
 670 ------
 671      1
 672    1.2
 673 (2 rows)
 674 </programlisting>
 675 </para>
 676 </sect3>
 677
 678 <sect3>
 679 <title>Transposed UNION</title>
 680
 681 <para>
 682 The types of the union are forced to match the types of
 683 the first/top clause in the union:
 684
 685 <programlisting>
 686 tgl=> SELECT 1 AS "All integers"
 687 tgl-> UNION SELECT '2.2'::float4
 688 tgl-> UNION SELECT 3.3;
 689 All integers
 690 ------------
 691            1
 692            2
 693            3
 694 (3 rows)
 695 </programlisting>
 696 </para>
 697 <para>
 698 An alternate parser strategy could be to choose the "best" type of the bunch, but
 699 this is more difficult because of the nice recursion technique used in the
 700 parser. However, the "best" type is used when selecting <emphasis>into</emphasis>
 701 a table:
 702
 703 <programlisting>
 704 tgl=> CREATE TABLE ff (f float);
 705 CREATE
 706 tgl=> INSERT INTO ff
 707 tgl-> SELECT 1
 708 tgl-> UNION SELECT '2.2'::float4
 709 tgl-> UNION SELECT 3.3;
 710 INSERT 0 3
 711 tgl=> SELECT f AS "Floating point" from ff;
 712   Floating point
 713 ----------------
 714                1
 715 2.20000004768372
 716              3.3
 717 (3 rows)
 718 </programlisting>
 719 </para>
 720 </sect3>
 721 </sect2>
 722 </sect1>
 723 </chapter>