From: Tom Lane Date: Mon, 1 Apr 2019 20:20:22 +0000 (-0400) Subject: Improve documentation about our XML functionality. X-Git-Tag: REL_12_BETA1~344 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=12d46ac392d00fa0d103f445aa4997f97029e007;p=postgresql Improve documentation about our XML functionality. Add a section explaining how our XML features depart from current versions of the SQL standard. Update and clarify the descriptions of some XML functions. Chapman Flack, reviewed by Ryan Lambert Discussion: https://postgr.es/m/5BD1284C.1010305@anastigmatix.net Discussion: https://postgr.es/m/5C81F8C0.6090901@anastigmatix.net Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com --- diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 52c28e7d74..cced57b2a8 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4219,6 +4219,11 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11 value is a full document or only a content fragment. + + Limits and compatibility notes for the xml data type + can be found in . + + Creating XML Values diff --git a/doc/src/sgml/features.sgml b/doc/src/sgml/features.sgml index 6c22d69867..d6fa2c0c72 100644 --- a/doc/src/sgml/features.sgml +++ b/doc/src/sgml/features.sgml @@ -16,7 +16,8 @@ Language SQL. A revised version of the standard is released from time to time; the most recent update appearing in 2011. The 2011 version is referred to as ISO/IEC 9075:2011, or simply as SQL:2011. - The versions prior to that were SQL:2008, SQL:2003, SQL:1999, and SQL-92. Each version + The versions prior to that were SQL:2008, SQL:2006, SQL:2003, SQL:1999, + and SQL-92. Each version replaces the previous one, so claims of conformance to earlier versions have no official merit. PostgreSQL development aims for @@ -155,4 +156,335 @@ + + XML Limits and Conformance to SQL/XML + + + SQL/XML + limits and conformance + + + + Significant revisions to the XML-related specifications in ISO/IEC 9075-14 + (SQL/XML) were introduced with SQL:2006. + PostgreSQL's implementation of the XML data + type and related functions largely follows the earlier 2003 edition, + with some borrowing from later editions. In particular: + + + + Where the current standard provides a family of XML data types + to hold document or content in + untyped or XML Schema-typed variants, and a type + XML(SEQUENCE) to hold arbitrary pieces of XML content, + PostgreSQL provides the single + xml type, which can hold document or + content. There is no equivalent of the + standard's sequence type. + + + + + + PostgreSQL provides two functions + introduced in SQL:2006, but in variants that use the XPath 1.0 + language, rather than XML Query as specified for them in the + standard. + + + + + + + This section presents some of the resulting differences you may encounter. + + + + Queries are restricted to XPath 1.0 + + + The PostgreSQL-specific functions + xpath() and xpath_exists() + query XML documents using the XPath language. + PostgreSQL also provides XPath-only variants + of the standard functions XMLEXISTS and + XMLTABLE, which officially use + the XQuery language. For all of these functions, + PostgreSQL relies on the + libxml2 library, which provides only XPath 1.0. + + + + There is a strong connection between the XQuery language and XPath + versions 2.0 and later: any expression that is syntactically valid and + executes successfully in both produces the same result (with a minor + exception for expressions containing numeric character references or + predefined entity references, which XQuery replaces with the + corresponding character while XPath leaves them alone). But there is + no such connection between these languages and XPath 1.0; it was an + earlier language and differs in many respects. + + + + There are two categories of limitation to keep in mind: the restriction + from XQuery to XPath for the functions specified in the SQL standard, and + the restriction of XPath to version 1.0 for both the standard and the + PostgreSQL-specific functions. + + + + Restriction of XQuery to XPath + + + Features of XQuery beyond those of XPath include: + + + + + XQuery expressions can construct and return new XML nodes, in + addition to all possible XPath values. XPath can create and return + values of the atomic types (numbers, strings, and so on) but can + only return XML nodes that were already present in documents + supplied as input to the expression. + + + + + + XQuery has control constructs for iteration, sorting, and grouping. + + + + + + XQuery allows declaration and use of local functions. + + + + + + + Recent XPath versions begin to offer capabilities overlapping with + these (such as functional-style for-each and + sort, anonymous functions, and + parse-xml to create a node from a string), + but such features were not available before XPath 3.0. + + + + + Restriction of XPath to 1.0 + + + For developers familiar with XQuery and XPath 2.0 or later, XPath 1.0 + presents a number of differences to contend with: + + + + + The fundamental type of an XQuery/XPath expression, the + sequence, which can contain XML nodes, atomic values, + or both, does not exist in XPath 1.0. A 1.0 expression can only + produce a node-set (containing zero or more XML nodes), or a single + atomic value. + + + + + + Unlike an XQuery/XPath sequence, which can contain any desired + items in any desired order, an XPath 1.0 node-set has no + guaranteed order and, like any set, does not allow multiple + appearances of the same item. + + + The libxml2 library does seem to + always return node-sets to PostgreSQL + with their members in the same relative order they had in the + input document. Its documentation does not commit to this + behavior, and an XPath 1.0 expression cannot control it. + + + + + + + + While XQuery/XPath provides all of the types defined in XML Schema + and many operators and functions over those types, XPath 1.0 has only + node-sets and the three atomic types boolean, + double, and string. + + + + + + XPath 1.0 has no conditional operator. An XQuery/XPath expression + such as if ( hat ) then hat/@size else "no hat" + has no XPath 1.0 equivalent. + + + + + + XPath 1.0 has no ordering comparison operator for strings. Both + "cat" < "dog" and + "cat" > "dog" are false, because each is a + numeric comparison of two NaNs. In contrast, + = and != do compare the strings + as strings. + + + + + + XPath 1.0 blurs the distinction between + value comparisons and + general comparisons as XQuery/XPath define + them. Both sale/@hatsize = 7 and + sale/@customer = "alice" are existentially + quantified comparisons, true if there is + any sale with the given value for the + attribute, but sale/@taxable = false() is a + value comparison to the + effective boolean value of a whole node-set. + It is true only if no sale has + a taxable attribute at all. + + + + + + In the XQuery/XPath data model, a document + node can have either document form (i.e., exactly one + top-level element, with only comments and processing instructions + outside of it) or content form (with those constraints + relaxed). Its equivalent in XPath 1.0, the + root node, can only be in document form. + This is part of the reason an xml value passed as the + context item to any PostgreSQL + XPath-based function must be in document form. + + + + + + + The differences highlighted here are not all of them. In XQuery and + the 2.0 and later versions of XPath, there is an XPath 1.0 compatibility + mode, and the W3C lists of + function library changes + and + language changes + applied in that mode offer a more complete (but still not exhaustive) + account of the differences. The compatibility mode cannot make the + later languages exactly equivalent to XPath 1.0. + + + + + Mappings between SQL and XML data types and values + + + In SQL:2006 and later, both directions of conversion between standard SQL + data types and the XML Schema types are specified precisely. However, the + rules are expressed using the types and semantics of XQuery/XPath, and + have no direct application to the different data model of XPath 1.0. + + + + When PostgreSQL maps SQL data values to XML + (as in xmlelement), or XML to SQL (as in the output + columns of xmltable), except for a few cases + treated specially, PostgreSQL simply assumes + that the XML data type's XPath 1.0 string form will be valid as the + text-input form of the SQL datatype, and conversely. This rule has the + virtue of simplicity while producing, for many data types, results similar + to the mappings specified in the standard. + + + + Where interoperability with other systems is a concern, for some data + types, it may be necessary to use data type formatting functions (such + as those in ) explicitly to + produce the standard mappings. + + + + + + + Incidental limits of the implementation + + + + This section concerns limits that are not inherent in the + libxml2 library, but apply to the current + implementation in PostgreSQL. + + + + Only <literal>BY VALUE</literal> passing mechanism is supported + + + The SQL standard defines two passing mechanisms + that apply when passing an XML argument from SQL to an XML function or + receiving a result: BY REF, in which a particular XML + value retains its node identity, and BY VALUE, in which + the content of the XML is passed but node identity is not preserved. A + mechanism can be specified before a list of parameters, as the default + mechanism for all of them, or after any parameter, to override the + default. + + + + To illustrate the difference, if + x is an XML value, these two queries in + an SQL:2006 environment would produce true and false, respectively: + + +SELECT XMLQUERY('$a is $b' PASSING BY REF x AS a, x AS b NULL ON EMPTY); +SELECT XMLQUERY('$a is $b' PASSING BY VALUE x AS a, x AS b NULL ON EMPTY); + + + + + PostgreSQL will accept + BY VALUE or BY REF in an + XMLEXISTS or XMLTABLE + construct, but it ignores them. The xml data type holds + a character-string serialized representation, so there is no node + identity to preserve, and passing is always effectively BY + VALUE. + + + + + Cannot pass named parameters to queries + + + The XPath-based functions support passing one parameter to serve as the + XPath expression's context item, but do not support passing additional + values to be available to the expression as named parameters. + + + + + No <type>XML(SEQUENCE)</type> type + + + The PostgreSQL xml data type + can only hold a value in DOCUMENT + or CONTENT form. An XQuery/XPath expression + context item must be a single XML node or atomic value, but XPath 1.0 + further restricts it to be only an XML node, and has no node type + allowing CONTENT. The upshot is that a + well-formed DOCUMENT is the only form of XML value + that PostgreSQL can supply as an XPath + context item. + + + + + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d24901126f..2aa1d1fc29 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10249,16 +10249,25 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple + XML Functions + + XML Functions + + The functions and function-like expressions described in this - section operate on values of type xml. Check xml. See for information about the xml type. The function-like expressions xmlparse and xmlserialize for converting to and from - type xml are not repeated here. Use of most of these - functions requires the installation to have been built + type xml are documented there, not in this section. + + + + Use of most of these functions + requires PostgreSQL to have been built with configure --with-libxml. @@ -10453,8 +10462,8 @@ SELECT xmlelement(name foo, xmlattributes('xyz' as bar), encoding, depending on the setting of the configuration parameter . The particular behavior for individual data types is expected to evolve in order to align the - SQL and PostgreSQL data types with the XML Schema specification, - at which point a more precise description will appear. + PostgreSQL mappings with those specified in SQL:2006 and later, + as discussed in . @@ -10696,10 +10705,13 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab; - The function xmlexists returns true if the - XPath expression in the first argument returns any nodes, and - false otherwise. (If either argument is null, the result is - null.) + The function xmlexists evaluates an XPath 1.0 + expression (the first argument), with the passed XML value as its context + item. The function returns false if the result of that evaluation + yields an empty node-set, true if it yields any other value. The + function returns null if any argument is null. A nonnull value + passed as the context item must be an XML document, not a content + fragment or any non-XML value. @@ -10715,25 +10727,14 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY VALUE 'T - The BY REF or BY VALUE clauses - have no effect in PostgreSQL, but are allowed - for compatibility with other implementations. Per the SQL - standard, the one that precedes any argument is required, and indicates - the default for arguments that follow, and one may follow any argument to - override the default. - PostgreSQL ignores BY REF - and passes by value always. - - - - In the SQL standard, an xmlexists - construct evaluates an expression in the XQuery language, allows passing - values for named parameters in the expression as well as for the context - item, and does not require the passed values to be documents, or even of - XML type. - In PostgreSQL, this construct currently only - evaluates an XPath 1.0 expression, and allows passing only one value, - which must be an XML document, to be the context item. + The BY REF and BY VALUE clauses + are accepted in PostgreSQL, but are ignored, + as discussed in . + In the SQL standard, the xmlexists function + evaluates an expression in the XML Query language, + but PostgreSQL allows only an XPath 1.0 + expression, as discussed in + . @@ -10839,12 +10840,12 @@ SELECT xml_is_well_formed_document('test The function xpath_exists is a specialized form of the xpath function. Instead of returning the - individual XML values that satisfy the XPath, this function returns a - Boolean indicating whether the query was satisfied or not. This - function is equivalent to the standard XMLEXISTS predicate, + individual XML values that satisfy the XPath 1.0 expression, this function + returns a Boolean indicating whether the query was satisfied or not + (specifically, whether it produced any value other than an empty node-set). + This function is equivalent to the XMLEXISTS predicate, except that it also offers support for a namespace mapping argument. @@ -10949,8 +10951,8 @@ SELECT xpath_exists('/my:a/text()', 'test The xmltable function produces a table based - on the given XML value, an XPath filter to extract rows, and an - optional set of column definitions. + on the given XML value, an XPath filter to extract rows, and a + set of column definitions. @@ -10961,30 +10963,34 @@ SELECT xpath_exists('/my:a/text()', 'test - The required row_expression argument is an XPath - expression that is evaluated against the supplied XML document to - obtain an ordered sequence of XML nodes. This sequence is what - xmltable transforms into output rows. + The required row_expression argument is + an XPath 1.0 expression that is evaluated, passing the + document_expression as its context item, to + obtain a set of XML nodes. These nodes are what + xmltable transforms into output rows. No rows + will be produced if the document_expression + is null, nor if the row_expression produces + an empty node-set or any value other than a node-set. - document_expression provides the XML document to - operate on. - The argument must be a well-formed XML document; fragments/forests - are not accepted. - The BY REF or BY VALUE clauses are - accepted, as described for the xmlexists predicate, - but ignored; PostgreSQL currently passes XML by value always. + document_expression provides the context + item for the row_expression. It must be a + well-formed XML document; fragments/forests are not accepted. + The BY REF and BY VALUE clauses + are accepted but ignored, as discussed in + . + In the SQL standard, the xmltable function + evaluates expressions in the XML Query language, + but PostgreSQL allows only XPath 1.0 + expressions, as discussed in + . The mandatory COLUMNS clause specifies the list of columns in the output table. - If the COLUMNS clause is omitted, the rows in the result - set contain a single column of type xml containing the - data matched by row_expression. - If COLUMNS is specified, each entry describes a - single column. + Each entry describes a single column. See the syntax summary above for the format. The column name and type are required; the path, default and nullability clauses are optional. @@ -10992,48 +10998,92 @@ SELECT xpath_exists('/my:a/text()', 'test A column marked FOR ORDINALITY will be populated - with row numbers matching the order in which the - output rows appeared in the original input XML document. + with row numbers, starting with 1, in the order of nodes retrieved from + the row_expression's result node-set. At most one column may be marked FOR ORDINALITY. + + + XPath 1.0 does not specify an order for nodes in a node-set, so code + that relies on a particular order of the results will be + implementation-dependent. Details can be found in + . + + + + + The column_expression for a column is an + XPath 1.0 expression that is evaluated for each row, with the current + node from the row_expression result as its + context item, to find the value of the column. If + no column_expression is given, then the + column name is used as an implicit path. + + - The column_expression for a column is an XPath expression - that is evaluated for each row, relative to the result of the - row_expression, to find the value of the column. - If no column_expression is given, then the column name - is used as an implicit path. + If a column's XPath expression returns a non-XML value (limited to + string, boolean, or double in XPath 1.0) and the column has a + PostgreSQL type other than xml, the column will be set + as if by assigning the value's string representation to the PostgreSQL + type. (If the value is a boolean, its string representation is taken + to be 1 or 0 if the output + column's type category is numeric, otherwise true or + false.) - If a column's XPath expression returns multiple elements, an error - is raised. - If the expression matches an empty tag, the result is an - empty string (not NULL). - Any xsi:nil attributes are ignored. + If a column's XPath expression returns a non-empty set of XML nodes + and the column's PostgreSQL type is xml, the column will + be assigned the expression result exactly, if it is of document or + content form. + + + A result containing more than one element node at the top level, or + non-whitespace text outside of an element, is an example of content form. + An XPath result can be of neither form, for example if it returns an + attribute node selected from the element that contains it. Such a result + will be put into content form with each such disallowed node replaced by + its string value, as defined for the XPath 1.0 + string function. + + - The text body of the XML matched by the column_expression - is used as the column value. Multiple text() nodes - within an element are concatenated in order. Any child elements, - processing instructions, and comments are ignored, but the text contents - of child elements are concatenated to the result. + A non-XML result assigned to an xml output column produces + content, a single text node with the string value of the result. + An XML result assigned to a column of any other type may not have more than + one node, or an error is raised. If there is exactly one node, the column + will be set as if by assigning the node's string + value (as defined for the XPath 1.0 string function) + to the PostgreSQL type. + + + + The string value of an XML element is the concatenation, in document order, + of all text nodes contained in that element and its descendants. The string + value of an element with no descendant text nodes is an + empty string (not NULL). + Any xsi:nil attributes are ignored. Note that the whitespace-only text() node between two non-text elements is preserved, and that leading whitespace on a text() node is not flattened. + The XPath 1.0 string function may be consulted for the + rules defining the string value of other XML node types and non-XML values. + + + + The conversion rules presented here are not exactly those of the SQL + standard, as discussed in . - If the path expression does not match for a given row but - default_expression is specified, the value resulting - from evaluating that expression is used. - If no DEFAULT clause is given for the column, - the field will be set to NULL. - It is possible for a default_expression to reference - the value of output columns that appear prior to it in the column list, - so the default of one column may be based on the value of another - column. + If the path expression returns an empty node-set + (typically, when it does not match) + for a given row, the column will be set to NULL, unless + a default_expression is specified; then the + value resulting from evaluating that expression is used. @@ -11045,20 +11095,14 @@ SELECT xpath_exists('/my:a/text()', 'test - Unlike regular PostgreSQL functions, column_expression - and default_expression are not evaluated to a simple - value before calling the function. - column_expression is normally evaluated - exactly once per input row, and default_expression - is evaluated each time a default is needed for a field. - If the expression qualifies as stable or immutable the repeat + A default_expression, rather than being + evaluated immediately when xmltable is called, + is evaluated each time a default is needed for the column. + If the expression qualifies as stable or immutable, the repeat evaluation may be skipped. - Effectively xmltable behaves more like a subquery than a - function call. This means that you can usefully use volatile functions like - nextval in default_expression, and - column_expression may depend on other parts of the - XML document. + nextval in + default_expression. diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index 6b23163929..f0e0fa2944 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -593,7 +593,7 @@ X085 Predefined namespace prefixes NO X086 XML namespace declarations in XMLTable NO X090 XML document predicate YES X091 XML content predicate NO -X096 XMLExists NO XPath only +X096 XMLExists NO XPath 1.0 only X100 Host language support for XML: CONTENT option NO X101 Host language support for XML: DOCUMENT option NO X110 Host language support for XML: VARCHAR mapping NO @@ -661,11 +661,11 @@ X282 XMLValidate with CONTENT option NO X283 XMLValidate with SEQUENCE option NO X284 XMLValidate: NAMESPACE without ELEMENT clause NO X286 XMLValidate: NO NAMESPACE with ELEMENT clause NO -X300 XMLTable NO XPath only +X300 XMLTable NO XPath 1.0 only X301 XMLTable: derived column list option YES X302 XMLTable: ordinality column option YES X303 XMLTable: column default option YES -X304 XMLTable: passing a context item YES +X304 XMLTable: passing a context item YES must be XML DOCUMENT X305 XMLTable: initializing an XQuery variable NO X400 Name and identifier mapping YES X410 Alter column data type: XML type YES