From 7b9fe96812982b53b01e54e63d773c5454f5f199 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sun, 17 Dec 2000 05:55:26 +0000 Subject: [PATCH] Update type-coercion discussions to reflect current reality. --- doc/src/sgml/func.sgml | 9 +- doc/src/sgml/typeconv.sgml | 338 ++++++++++++++++++++++--------------- 2 files changed, 204 insertions(+), 143 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index fdd1c34a3e..0d59899f80 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1,4 +1,4 @@ - + Functions and Operators @@ -770,7 +770,7 @@ There are two separate approaches to pattern matching provided by - Postgres: The SQL + Postgres: the SQL LIKE operator and POSIX-style regular expressions. @@ -2562,8 +2562,9 @@ END - The data types of all possible result - expressions must match. + The data types of all the result + expressions must be coercible to a single output type. + See for more detail. diff --git a/doc/src/sgml/typeconv.sgml b/doc/src/sgml/typeconv.sgml index 5169999f98..cac31d8dfb 100644 --- a/doc/src/sgml/typeconv.sgml +++ b/doc/src/sgml/typeconv.sgml @@ -12,16 +12,17 @@ evaluating mixed-type expressions. In many cases a user will not need to understand the details of the type conversion mechanism. However, the implicit conversions done by Postgres -can affect the apparent results of a query, and these results +can affect the results of a query. When necessary, these results can be tailored by a user or programmer using explicit type coercion. This chapter introduces the Postgres - type conversion mechanisms and conventions. +type conversion mechanisms and conventions. Refer to the relevant sections in the User's Guide and Programmer's Guide -for more information on specific data types and allowed functions and operators. +for more information on specific data types and allowed functions and +operators. @@ -43,12 +44,13 @@ mixed-type expressions to be meaningful, even with user-defined types. -The Postgres scanner/parser decodes lexical elements -into only five fundamental categories: integers, floats, strings, names, and keywords. -Most extended types are first tokenized into strings. The SQL -language definition allows specifying type names with strings, and this mechanism -is used by Postgres -to start the parser down the correct path. For example, the query +The Postgres scanner/parser decodes lexical +elements into only five fundamental categories: integers, floats, strings, +names, and keywords. Most extended types are first tokenized into +strings. The SQL language definition allows specifying type +names with strings, and this mechanism can be used in +Postgres to start the parser down the correct +path. For example, the query tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value"; @@ -59,8 +61,9 @@ tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value"; has two strings, of type text and point. -If a type is not specified, then the placeholder type unknown -is assigned initially, to be resolved in later stages as described below. +If a type is not specified for a string, then the placeholder type +unknown is assigned initially, to be resolved in later +stages as described below. @@ -88,9 +91,13 @@ Function calls -Much of the Postgres type system is built around a rich set of -functions. Function calls have one or more arguments which, for any specific query, -must be matched to the functions available in the system catalog. +Much of the Postgres type system is built around a +rich set of functions. Function calls have one or more arguments which, for +any specific query, must be matched to the functions available in the system +catalog. Since Postgres permits function +overloading, the function name alone does not uniquely identify the function +to be called --- the parser must select the right function based on the data +types of the supplied arguments. @@ -100,19 +107,23 @@ Query targets -SQL INSERT statements place the results of query into a table. The expressions -in the query must be matched up with, and perhaps converted to, the target columns of the insert. +SQL INSERT and UPDATE statements place the results of +expressions into a table. The expressions in the query must be matched up +with, and perhaps converted to, the types of the target columns. -UNION queries +UNION and CASE constructs -Since all select results from a UNION SELECT statement must appear in a single set of columns, the types +Since all select results from a UNION SELECT statement must appear in a single +set of columns, the types of the results of each SELECT clause must be matched up and converted to a uniform set. +Similarly, the result expressions of a CASE construct must be coerced to +a common type so that the CASE expression as a whole has a known output type. @@ -129,7 +140,7 @@ conventions for the SQL92 standard native types such as The Postgres parser uses the convention that all type conversion functions take a single argument of the source type and are -named with the same name as the target type. Any function meeting this +named with the same name as the target type. Any function meeting these criteria is considered to be a valid conversion function, and may be used by the parser as such. This simple assumption gives the parser the power to explore type conversion possibilities without hardcoding, allowing @@ -139,19 +150,16 @@ extended user-defined types to use these same features transparently. An additional heuristic is provided in the parser to allow better guesses at proper behavior for SQL standard types. There are -five categories of types defined: boolean, string, numeric, geometric, +several basic type categories defined: boolean, +numeric, string, bitstring, datetime, timespan, geometric, network, and user-defined. Each category, with the exception of user-defined, has -a "preferred type" which is used to resolve ambiguities in candidates. -Each "user-defined" type is its own "preferred type", so ambiguous -expressions (those with multiple candidate parsing solutions) -with only one user-defined type can resolve to a single best choice, while those with -multiple user-defined types will remain ambiguous and throw an error. - - - -Ambiguous expressions which have candidate solutions within only one type category are -likely to resolve, while ambiguous expressions with candidates spanning multiple -categories are likely to throw an error and ask for clarification from the user. +a preferred type which is preferentially selected +when there is ambiguity. +In the user-defined category, each type is its own preferred type. +Ambiguous expressions (those with multiple candidate parsing solutions) +can often be resolved when there are multiple possible built-in types, but +they will raise an error when there are multiple choices for user-defined +types. @@ -207,12 +215,8 @@ should use this new function and will no longer do the implicit conversion using Operators - -Conversion Procedure - -Operator Evaluation - +Operator Type Resolution @@ -222,15 +226,10 @@ Check for an exact match in the pg_operator system catalog. -If one argument of a binary operator is unknown, -then assume it is the same type as the other argument. - - - - -Reverse the arguments, and look for an exact match with an operator which -points to itself as being commutative. -If found, then reverse the arguments in the parse tree and use this operator. +If one argument of a binary operator is unknown type, +then assume it is the same type as the other argument for this check. +Other cases involving unknown will never find a match at +this step. @@ -241,46 +240,63 @@ If found, then reverse the arguments in the parse tree and use this operator. Look for the best match. - + -Make a list of all operators of the same name. +Make a list of all operators of the same name for which the input types +match or can be coerced to match. (unknown literals are +assumed to be coercible to anything for this purpose.) If there is only +one, use it; else continue to the next step. -If only one operator is in the list, use it if the input type can be coerced, -and throw an error if the type cannot be coerced. +Run through all candidates and keep those with the most exact matches +on input types. Keep all candidates if none have any exact matches. +If only one candidate remains, use it; else continue to the next step. + + + +Run through all candidates and keep those with the most exact or +binary-compatible matches on input types. Keep all candidates if none have +any exact or binary-compatible matches. +If only one candidate remains, use it; else continue to the next step. -Keep all operators with the most explicit matches for types. Keep all if there -are no explicit matches and move to the next step. -If only one candidate remains, use it if the type can be coerced. +Run through all candidates and keep those which accept preferred types at +the most positions where type coercion will be required. +Keep all candidates if none accept preferred types. +If only one candidate remains, use it; else continue to the next step. -If any input arguments are "unknown", categorize the input candidates as -boolean, numeric, string, geometric, or user-defined. If there is a mix of -categories, or more than one user-defined type, throw an error because -the correct choice cannot be deduced without more clues. -If only one category is present, then assign the "preferred type" -to the input column which had been previously "unknown". +If any input arguments are "unknown", check the type categories accepted +at those argument positions by the remaining candidates. At each position, +select "string" +category if any candidate accepts that category (this bias towards string +is appropriate since an unknown-type literal does look like a string). +Otherwise, if all the remaining candidates accept the same type category, +select that category; otherwise raise an error because +the correct choice cannot be deduced without more clues. Also note whether +any of the candidates accept a preferred datatype within the selected category. +Now discard operator candidates that do not accept the selected type category; +furthermore, if any candidate accepts a preferred type at a given argument +position, discard candidates that accept non-preferred types for that +argument. -Choose the candidate with the most exact type matches, and which matches -the "preferred type" for each column category from the previous step. -If there is still more than one candidate, or if there are none, -then throw an error. +If only one candidate remains, use it. If no candidate or more than one +candidate remains, +then raise an error. - Examples @@ -372,17 +388,12 @@ tgl=> SELECT 'abc' || 'def' AS "Unspecified"; In this case there is no initial hint for which type to use, since no types are specified in the query. So, the parser looks for all candidate operators -and finds that all arguments for all the candidates are string types. It chooses -the "preferred type" for strings, text, for this query. - - - - -If a user defines a new type and defines an operator "||" to work -with it, then this query would no longer succeed as written. The parser would -now have candidate types from two categories, and could not decide which to use. +and finds that there are candidates accepting both string-category and +bitstring-category inputs. Since string category is preferred when available, +that category is selected, and then the +"preferred type" for strings, text, is used as the specific +type to resolve the unknown literals to. - @@ -423,11 +434,13 @@ will try to oblige. Functions -Function Evaluation +Function Call Type Resolution Check for an exact match in the pg_proc system catalog. +(Cases involving unknown will never find a match at +this step.) @@ -436,38 +449,63 @@ Look for the best match. -Make a list of all functions of the same name with the same number of arguments. - +Make a list of all functions of the same name with the same number of +arguments for which the input types +match or can be coerced to match. (unknown literals are +assumed to be coercible to anything for this purpose.) If there is only +one, use it; else continue to the next step. + + -If only one function is in the list, use it if the input types can be coerced, -and throw an error if the types cannot be coerced. - +Run through all candidates and keep those with the most exact matches +on input types. Keep all candidates if none have any exact matches. +If only one candidate remains, use it; else continue to the next step. + -Keep all functions with the most explicit matches for types. Keep all if there -are no explicit matches and move to the next step. -If only one candidate remains, use it if the type can be coerced. - +Run through all candidates and keep those with the most exact or +binary-compatible matches on input types. Keep all candidates if none have +any exact or binary-compatible matches. +If only one candidate remains, use it; else continue to the next step. + + -If any input arguments are "unknown", categorize the input candidate arguments as -boolean, numeric, string, geometric, or user-defined. If there is a mix of -categories, or more than one user-defined type, throw an error because -the correct choice cannot be deduced without more clues. -If only one category is present, then assign the "preferred type" -to the input column which had been previously "unknown". - +Run through all candidates and keep those which accept preferred types at +the most positions where type coercion will be required. +Keep all candidates if none accept preferred types. +If only one candidate remains, use it; else continue to the next step. + + -Choose the candidate with the most exact type matches, and which matches -the "preferred type" for each column category from the previous step. -If there is still more than one candidate, or if there are none, -then throw an error. - +If any input arguments are "unknown", check the type categories accepted +at those argument positions by the remaining candidates. At each position, +select "string" +category if any candidate accepts that category (this bias towards string +is appropriate since an unknown-type literal does look like a string). +Otherwise, if all the remaining candidates accept the same type category, +select that category; otherwise raise an error because +the correct choice cannot be deduced without more clues. Also note whether +any of the candidates accept a preferred datatype within the selected category. +Now discard operator candidates that do not accept the selected type category; +furthermore, if any candidate accepts a preferred type at a given argument +position, discard candidates that accept non-preferred types for that +argument. + + + + +If only one candidate remains, use it. If no candidate or more than one +candidate remains, +then raise an error. + + + Examples @@ -539,10 +577,10 @@ tgl=> select substr(text(varchar '1234'), 3); -There are some heuristics in the parser to optimize the relationship between the -char, varchar, and text types. -For this case, substr is called directly with the varchar string -rather than inserting an explicit conversion call. +Actually, the parser is aware that text and varchar +are "binary compatible", meaning that one can be passed to a function that +accepts the other without doing any physical conversion. Therefore, no +explicit type conversion call is really inserted in this case. @@ -564,6 +602,8 @@ tgl=> select substr(text(1234), 3); 34 (1 row) +This succeeds because there is a conversion function text(int4) in the +system catalog. @@ -573,7 +613,7 @@ tgl=> select substr(text(1234), 3); Query Targets -Target Evaluation +Query Target Type Resolution @@ -581,15 +621,21 @@ Check for an exact match with the target. -Try to coerce the expression directly to the target type if necessary. +Otherwise, try to coerce the expression to the target type. This will succeed +if the two types are known binary-compatible, or if there is a conversion +function. If the expression is an unknown-type literal, the contents of +the literal string will be fed to the input conversion routine for the target +type. If the target is a fixed-length type (e.g. char or varchar -declared with a length) then try to find a sizing function of the same name -as the type taking two arguments, the first the type name and the second an -integer length. +declared with a length) then try to find a sizing function for the target +type. A sizing function is a function of the same name as the type, +taking two arguments of which the first is that type and the second is an +integer, and returning the same type. If one is found, it is applied, +passing the column's declared length as the second parameter. @@ -613,32 +659,62 @@ tgl=> SELECT * FROM vv; v ------ abcd -(1 row) +(1 row) + +What's really happened here is that the two unknown literals are resolved +to text by default, allowing the || operator to be +resolved as text concatenation. Then the text result of the operator +is coerced to varchar to match the target column type. (But, since the +parser knows that text and varchar are binary-compatible, this coercion +is implicit and does not insert any real function call.) Finally, the +sizing function varchar(varchar,int4) is found in the system +catalogs and applied to the operator's result and the stored column length. +This type-specific function performs the desired truncation. - -UNION Queries + +UNION and CASE Constructs -The UNION construct is somewhat different in that it must match up -possibly dissimilar types to become a single result set. +The UNION and CASE constructs must match up possibly dissimilar types to +become a single result set. The resolution algorithm is applied separately to +each output column of a UNION. CASE uses the identical algorithm to match +up its result expressions. -UNION Evaluation +UNION and CASE Type Resolution + + + +If all inputs are of type unknown, resolve as type +text (the preferred type for string category). +Otherwise, ignore the unknown inputs while choosing the type. + + + + +If the non-unknown inputs are not all of the same type category, raise an +error. + -Check for identical types for all results. +If one or more non-unknown inputs are of a preferred type in that category, +resolve as that type. -Coerce each result from the UNION clauses to match the type of the -first SELECT clause or the target column. +Otherwise, resolve as the type of the first non-unknown input. + + + + +Coerce all inputs to the selected type. @@ -657,6 +733,7 @@ tgl=> SELECT text 'a' AS "Text" UNION SELECT 'b'; b (2 rows) +Here, the unknown-type literal 'b' will be resolved as type text. @@ -679,43 +756,26 @@ tgl=> SELECT 1.2 AS "Float8" UNION SELECT 1; Transposed UNION -The types of the union are forced to match the types of +Here the output type of the union is forced to match the type of the first/top clause in the union: tgl=> SELECT 1 AS "All integers" -tgl-> UNION SELECT '2.2'::float4 -tgl-> UNION SELECT 3.3; +tgl-> UNION SELECT '2.2'::float4; All integers -------------- 1 2 - 3 -(3 rows) +(2 rows) -An alternate parser strategy could be to choose the "best" type of the bunch, but -this is more difficult because of the nice recursion technique used in the -parser. However, the "best" type is used when selecting into -a table: - - -tgl=> CREATE TABLE ff (f float); -CREATE -tgl=> INSERT INTO ff -tgl-> SELECT 1 -tgl-> UNION SELECT '2.2'::float4 -tgl-> UNION SELECT 3.3; -INSERT 0 3 -tgl=> SELECT f AS "Floating point" from ff; - Floating point ------------------- - 1 - 2.20000004768372 - 3.3 -(3 rows) - +Since float4 is not a preferred type, the parser sees no reason to select it +over int4, and instead falls back on the use-the-first-alternative rule. +This example demonstrates that the preferred-type mechanism doesn't encode +as much information as we'd like. Future versions of +Postgres may support a more general notion of +type preferences. -- 2.49.0