granicus.if.org Git - postgresql/blob - doc/src/sgml/syntax.sgml

   1 <!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.110 2006/10/21 17:12:07 tgl Exp $ -->
   2
   3 <chapter id="sql-syntax">
   4  <title>SQL Syntax</title>
   5
   6  <indexterm zone="sql-syntax">
   7   <primary>syntax</primary>
   8   <secondary>SQL</secondary>
   9  </indexterm>
  10
  11  <para>
  12   This chapter describes the syntax of SQL.  It forms the foundation
  13   for understanding the following chapters which will go into detail
  14   about how the SQL commands are applied to define and modify data.
  15  </para>
  16
  17  <para>
  18   We also advise users who are already familiar with SQL to read this
  19   chapter carefully because there are several rules and concepts that
  20   are implemented inconsistently among SQL databases or that are
  21   specific to <productname>PostgreSQL</productname>.
  22  </para>
  23
  24  <sect1 id="sql-syntax-lexical">
  25   <title>Lexical Structure</title>
  26
  27   <indexterm>
  28    <primary>token</primary>
  29   </indexterm>
  30
  31   <para>
  32    SQL input consists of a sequence of
  33    <firstterm>commands</firstterm>.  A command is composed of a
  34    sequence of <firstterm>tokens</firstterm>, terminated by a
  35    semicolon (<quote>;</quote>).  The end of the input stream also
  36    terminates a command.  Which tokens are valid depends on the syntax
  37    of the particular command.
  38   </para>
  39
  40   <para>
  41    A token can be a <firstterm>key word</firstterm>, an
  42    <firstterm>identifier</firstterm>, a <firstterm>quoted
  43    identifier</firstterm>, a <firstterm>literal</firstterm> (or
  44    constant), or a special character symbol.  Tokens are normally
  45    separated by whitespace (space, tab, newline), but need not be if
  46    there is no ambiguity (which is generally only the case if a
  47    special character is adjacent to some other token type).
  48   </para>
  49
  50   <para>
  51    Additionally, <firstterm>comments</firstterm> can occur in SQL
  52    input.  They are not tokens, they are effectively equivalent to
  53    whitespace.
  54   </para>
  55
  56    <para>
  57     For example, the following is (syntactically) valid SQL input:
  58 <programlisting>
  59 SELECT * FROM MY_TABLE;
  60 UPDATE MY_TABLE SET A = 5;
  61 INSERT INTO MY_TABLE VALUES (3, 'hi there');
  62 </programlisting>
  63     This is a sequence of three commands, one per line (although this
  64     is not required; more than one command can be on a line, and
  65     commands can usefully be split across lines).
  66    </para>
  67
  68   <para>
  69    The SQL syntax is not very consistent regarding what tokens
  70    identify commands and which are operands or parameters.  The first
  71    few tokens are generally the command name, so in the above example
  72    we would usually speak of a <quote>SELECT</quote>, an
  73    <quote>UPDATE</quote>, and an <quote>INSERT</quote> command.  But
  74    for instance the <command>UPDATE</command> command always requires
  75    a <token>SET</token> token to appear in a certain position, and
  76    this particular variation of <command>INSERT</command> also
  77    requires a <token>VALUES</token> in order to be complete.  The
  78    precise syntax rules for each command are described in <xref linkend="reference">.
  79   </para>
  80
  81   <sect2 id="sql-syntax-identifiers">
  82    <title>Identifiers and Key Words</title>
  83
  84    <indexterm zone="sql-syntax-identifiers">
  85     <primary>identifier</primary>
  86     <secondary>syntax of</secondary>
  87    </indexterm>
  88
  89    <indexterm zone="sql-syntax-identifiers">
  90     <primary>name</primary>
  91     <secondary>syntax of</secondary>
  92    </indexterm>
  93
  94    <indexterm zone="sql-syntax-identifiers">
  95     <primary>key word</primary>
  96     <secondary>syntax of</secondary>
  97    </indexterm>
  98
  99    <para>
 100     Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or
 101     <token>VALUES</token> in the example above are examples of
 102     <firstterm>key words</firstterm>, that is, words that have a fixed
 103     meaning in the SQL language.  The tokens <token>MY_TABLE</token>
 104     and <token>A</token> are examples of
 105     <firstterm>identifiers</firstterm>.  They identify names of
 106     tables, columns, or other database objects, depending on the
 107     command they are used in.  Therefore they are sometimes simply
 108     called <quote>names</quote>.  Key words and identifiers have the
 109     same lexical structure, meaning that one cannot know whether a
 110     token is an identifier or a key word without knowing the language.
 111     A complete list of key words can be found in <xref
 112     linkend="sql-keywords-appendix">.
 113    </para>
 114
 115    <para>
 116     SQL identifiers and key words must begin with a letter
 117     (<literal>a</literal>-<literal>z</literal>, but also letters with
 118     diacritical marks and non-Latin letters) or an underscore
 119     (<literal>_</literal>).  Subsequent characters in an identifier or
 120     key word can be letters, underscores, digits
 121     (<literal>0</literal>-<literal>9</literal>), or dollar signs
 122     (<literal>$</>).  Note that dollar signs are not allowed in identifiers
 123     according to the letter of the SQL standard, so their use may render
 124     applications less portable.
 125     The SQL standard will not define a key word that contains
 126     digits or starts or ends with an underscore, so identifiers of this
 127     form are safe against possible conflict with future extensions of the
 128     standard.
 129    </para>
 130
 131    <para>
 132     <indexterm><primary>identifier</primary><secondary>length</secondary></indexterm>
 133     The system uses no more than <symbol>NAMEDATALEN</symbol>-1
 134     characters of an identifier; longer names can be written in
 135     commands, but they will be truncated.  By default,
 136     <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier
 137     length is 63. If this limit is problematic, it can be raised by
 138     changing the <symbol>NAMEDATALEN</symbol> constant in
 139     <filename>src/include/postgres_ext.h</filename>.
 140    </para>
 141
 142    <para>
 143     <indexterm>
 144      <primary>case sensitivity</primary>
 145      <secondary>of SQL commands</secondary>
 146     </indexterm>
 147     Identifier and key word names are case insensitive.  Therefore
 148 <programlisting>
 149 UPDATE MY_TABLE SET A = 5;
 150 </programlisting>
 151     can equivalently be written as
 152 <programlisting>
 153 uPDaTE my_TabLE SeT a = 5;
 154 </programlisting>
 155     A convention often used is to write key words in upper
 156     case and names in lower case, e.g.,
 157 <programlisting>
 158 UPDATE my_table SET a = 5;
 159 </programlisting>
 160    </para>
 161
 162    <para>
 163     <indexterm>
 164      <primary>quotation marks</primary>
 165      <secondary>and identifiers</secondary>
 166     </indexterm>
 167     There is a second kind of identifier:  the <firstterm>delimited
 168     identifier</firstterm> or <firstterm>quoted
 169     identifier</firstterm>.  It is formed by enclosing an arbitrary
 170     sequence of characters in double-quotes
 171     (<literal>"</literal>). <!-- " font-lock mania --> A delimited
 172     identifier is always an identifier, never a key word.  So
 173     <literal>"select"</literal> could be used to refer to a column or
 174     table named <quote>select</quote>, whereas an unquoted
 175     <literal>select</literal> would be taken as a key word and
 176     would therefore provoke a parse error when used where a table or
 177     column name is expected.  The example can be written with quoted
 178     identifiers like this:
 179 <programlisting>
 180 UPDATE "my_table" SET "a" = 5;
 181 </programlisting>
 182    </para>
 183
 184    <para>
 185     Quoted identifiers can contain any character other than a double
 186     quote itself.  (To include a double quote, write two double quotes.)
 187     This allows constructing table or column names that would
 188     otherwise not be possible, such as ones containing spaces or
 189     ampersands.  The length limitation still applies.
 190    </para>
 191
 192    <para>
 193     Quoting an identifier also makes it case-sensitive, whereas
 194     unquoted names are always folded to lower case.  For example, the
 195     identifiers <literal>FOO</literal>, <literal>foo</literal>, and
 196     <literal>"foo"</literal> are considered the same by
 197     <productname>PostgreSQL</productname>, but
 198     <literal>"Foo"</literal> and <literal>"FOO"</literal> are
 199     different from these three and each other.  (The folding of
 200     unquoted names to lower case in <productname>PostgreSQL</> is
 201     incompatible with the SQL standard, which says that unquoted names
 202     should be folded to upper case.  Thus, <literal>foo</literal>
 203     should be equivalent to <literal>"FOO"</literal> not
 204     <literal>"foo"</literal> according to the standard.  If you want
 205     to write portable applications you are advised to always quote a
 206     particular name or never quote it.)
 207    </para>
 208   </sect2>
 209
 210
 211   <sect2 id="sql-syntax-constants">
 212    <title>Constants</title>
 213
 214    <indexterm zone="sql-syntax-constants">
 215     <primary>constant</primary>
 216    </indexterm>
 217
 218    <para>
 219     There are three kinds of <firstterm>implicitly-typed
 220     constants</firstterm> in <productname>PostgreSQL</productname>:
 221     strings, bit strings, and numbers.
 222     Constants can also be specified with explicit types, which can
 223     enable more accurate representation and more efficient handling by
 224     the system. These alternatives are discussed in the following
 225     subsections.
 226    </para>
 227
 228    <sect3 id="sql-syntax-strings">
 229     <title>String Constants</title>
 230
 231     <indexterm zone="sql-syntax-strings">
 232      <primary>character string</primary>
 233      <secondary>constant</secondary>
 234     </indexterm>
 235
 236     <para>
 237      <indexterm>
 238       <primary>quotation marks</primary>
 239       <secondary>escaping</secondary>
 240      </indexterm>
 241      A string constant in SQL is an arbitrary sequence of characters
 242      bounded by single quotes (<literal>'</literal>), for example
 243      <literal>'This is a string'</literal>.  To include
 244      a single-quote character within a string constant,
 245      write two adjacent single quotes, e.g.
 246      <literal>'Dianne''s horse'</literal>.
 247      Note that this is <emphasis>not</> the same as a double-quote
 248      character (<literal>"</>).
 249     </para>
 250
 251     <para>
 252      Two string constants that are only separated by whitespace
 253      <emphasis>with at least one newline</emphasis> are concatenated
 254      and effectively treated as if the string had been written in one
 255      constant.  For example:
 256 <programlisting>
 257 SELECT 'foo'
 258 'bar';
 259 </programlisting>
 260      is equivalent to
 261 <programlisting>
 262 SELECT 'foobar';
 263 </programlisting>
 264      but
 265 <programlisting>
 266 SELECT 'foo'      'bar';
 267 </programlisting>
 268      is not valid syntax.  (This slightly bizarre behavior is specified
 269      by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
 270      following the standard.)
 271     </para>
 272
 273     <para>
 274      <indexterm>
 275       <primary>escape string syntax</primary>
 276      </indexterm>
 277      <indexterm>
 278       <primary>backslash escapes</primary>
 279      </indexterm>
 280      <productname>PostgreSQL</productname> also accepts <quote>escape</>
 281      string constants, which are an extension to the SQL standard.
 282      An escape string constant is specified by writing the letter
 283      <literal>E</literal> (upper or lower case) just before the opening single
 284      quote, e.g. <literal>E'foo'</>.  (When continuing an escape string
 285      constant across lines, write <literal>E</> only before the first opening
 286      quote.)
 287      Within an escape string, a backslash character (<literal>\</>) begins a
 288      C-like <firstterm>backslash escape</> sequence, in which the combination
 289      of backslash and following character(s) represents a special byte value.
 290      <literal>\b</literal> is a backspace,
 291      <literal>\f</literal> is a form feed,
 292      <literal>\n</literal> is a newline,
 293      <literal>\r</literal> is a carriage return,
 294      <literal>\t</literal> is a tab.
 295      Also supported are
 296      <literal>\<replaceable>digits</replaceable></literal>, where
 297      <replaceable>digits</replaceable> represents an octal byte value, and
 298      <literal>\x<replaceable>hexdigits</replaceable></literal>, where
 299      <replaceable>hexdigits</replaceable> represents a hexadecimal byte value.
 300      (It is your responsibility that the byte sequences you create are
 301      valid characters in the server character set encoding.) Any other
 302      character following a backslash is taken literally. Thus, to
 303      include a backslash character, write two backslashes (<literal>\\</>).
 304      Also, a single quote can be included in an escape string by writing
 305      <literal>\'</literal>, in addition to the normal way of <literal>''</>.
 306     </para>
 307
 308     <caution>
 309     <para>
 310      If the configuration parameter
 311      <xref linkend="guc-standard-conforming-strings"> is <literal>off</>,
 312      then <productname>PostgreSQL</productname> recognizes backslash escapes
 313      in both regular and escape string constants.  This is for backward
 314      compatibility with the historical behavior, in which backslash escapes
 315      were always recognized.
 316      Although <varname>standard_conforming_strings</> currently defaults to
 317      <literal>off</>, the default will change to <literal>on</> in a future
 318      release for improved standards compliance.  Applications are therefore
 319      encouraged to migrate away from using backslash escapes.  If you need
 320      to use a backslash escape to represent a special character, write the
 321      constant with an <literal>E</> to be sure it will be handled the same
 322      way in future releases.
 323     </para>
 324
 325     <para>
 326      In addition to <varname>standard_conforming_strings</>, the configuration
 327      parameters <xref linkend="guc-escape-string-warning"> and
 328      <xref linkend="guc-backslash-quote"> govern treatment of backslashes
 329      in string constants.
 330     </para>
 331     </caution>
 332
 333     <para>
 334      The character with the code zero cannot be in a string constant.
 335     </para>
 336    </sect3>
 337
 338    <sect3 id="sql-syntax-dollar-quoting">
 339     <title>Dollar-Quoted String Constants</title>
 340
 341      <indexterm>
 342       <primary>dollar quoting</primary>
 343      </indexterm>
 344
 345     <para>
 346      While the standard syntax for specifying string constants is usually
 347      convenient, it can be difficult to understand when the desired string
 348      contains many single quotes or backslashes, since each of those must
 349      be doubled. To allow more readable queries in such situations,
 350      <productname>PostgreSQL</productname> provides another way, called
 351      <quote>dollar quoting</quote>, to write string constants.
 352      A dollar-quoted string constant
 353      consists of a dollar sign (<literal>$</literal>), an optional
 354      <quote>tag</quote> of zero or more characters, another dollar
 355      sign, an arbitrary sequence of characters that makes up the
 356      string content, a dollar sign, the same tag that began this
 357      dollar quote, and a dollar sign. For example, here are two
 358      different ways to specify the string <quote>Dianne's horse</>
 359      using dollar quoting:
 360 <programlisting>
 361 $$Dianne's horse$$
 362 $SomeTag$Dianne's horse$SomeTag$
 363 </programlisting>
 364      Notice that inside the dollar-quoted string, single quotes can be
 365      used without needing to be escaped.  Indeed, no characters inside
 366      a dollar-quoted string are ever escaped: the string content is always
 367      written literally.  Backslashes are not special, and neither are
 368      dollar signs, unless they are part of a sequence matching the opening
 369      tag.
 370     </para>
 371
 372     <para>
 373      It is possible to nest dollar-quoted string constants by choosing
 374      different tags at each nesting level.  This is most commonly used in
 375      writing function definitions.  For example:
 376 <programlisting>
 377 $function$
 378 BEGIN
 379     RETURN ($1 ~ $q$[\t\r\n\v\\]$q$);
 380 END;
 381 $function$
 382 </programlisting>
 383      Here, the sequence <literal>$q$[\t\r\n\v\\]$q$</> represents a
 384      dollar-quoted literal string <literal>[\t\r\n\v\\]</>, which will
 385      be recognized when the function body is executed by
 386      <productname>PostgreSQL</>.  But since the sequence does not match
 387      the outer dollar quoting delimiter <literal>$function$</>, it is
 388      just some more characters within the constant so far as the outer
 389      string is concerned.
 390     </para>
 391
 392     <para>
 393      The tag, if any, of a dollar-quoted string follows the same rules
 394      as an unquoted identifier, except that it cannot contain a dollar sign.
 395      Tags are case sensitive, so <literal>$tag$String content$tag$</literal>
 396      is correct, but <literal>$TAG$String content$tag$</literal> is not.
 397     </para>
 398
 399     <para>
 400      A dollar-quoted string that follows a keyword or identifier must
 401      be separated from it by whitespace; otherwise the dollar quoting
 402      delimiter would be taken as part of the preceding identifier.
 403     </para>
 404
 405     <para>
 406      Dollar quoting is not part of the SQL standard, but it is often a more
 407      convenient way to write complicated string literals than the
 408      standard-compliant single quote syntax.  It is particularly useful when
 409      representing string constants inside other constants, as is often needed
 410      in procedural function definitions.  With single-quote syntax, each
 411      backslash in the above example would have to be written as four
 412      backslashes, which would be reduced to two backslashes in parsing the
 413      original string constant, and then to one when the inner string constant
 414      is re-parsed during function execution.
 415     </para>
 416    </sect3>
 417
 418    <sect3 id="sql-syntax-bit-strings">
 419     <title>Bit-String Constants</title>
 420
 421     <indexterm zone="sql-syntax-bit-strings">
 422      <primary>bit string</primary>
 423      <secondary>constant</secondary>
 424     </indexterm>
 425
 426     <para>
 427      Bit-string constants look like regular string constants with a
 428      <literal>B</literal> (upper or lower case) immediately before the
 429      opening quote (no intervening whitespace), e.g.,
 430      <literal>B'1001'</literal>.  The only characters allowed within
 431      bit-string constants are <literal>0</literal> and
 432      <literal>1</literal>.
 433     </para>
 434
 435     <para>
 436      Alternatively, bit-string constants can be specified in hexadecimal
 437      notation, using a leading <literal>X</literal> (upper or lower case),
 438      e.g., <literal>X'1FF'</literal>.  This notation is equivalent to
 439      a bit-string constant with four binary digits for each hexadecimal digit.
 440     </para>
 441
 442     <para>
 443      Both forms of bit-string constant can be continued
 444      across lines in the same way as regular string constants.
 445      Dollar quoting cannot be used in a bit-string constant.
 446     </para>
 447    </sect3>
 448
 449    <sect3>
 450     <title>Numeric Constants</title>
 451
 452     <indexterm>
 453      <primary>number</primary>
 454      <secondary>constant</secondary>
 455     </indexterm>
 456
 457     <para>
 458      Numeric constants are accepted in these general forms:
 459 <synopsis>
 460 <replaceable>digits</replaceable>
 461 <replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
 462 <optional><replaceable>digits</replaceable></optional>.<replaceable>digits</replaceable><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
 463 <replaceable>digits</replaceable>e<optional>+-</optional><replaceable>digits</replaceable>
 464 </synopsis>
 465      where <replaceable>digits</replaceable> is one or more decimal
 466      digits (0 through 9).  At least one digit must be before or after the
 467      decimal point, if one is used.  At least one digit must follow the
 468      exponent marker (<literal>e</literal>), if one is present.
 469      There may not be any spaces or other characters embedded in the
 470      constant.  Note that any leading plus or minus sign is not actually
 471      considered part of the constant; it is an operator applied to the
 472      constant.
 473     </para>
 474
 475     <para>
 476      These are some examples of valid numeric constants:
 477 <literallayout>
 478 42
 479 3.5
 480 4.
 481 .001
 482 5e2
 483 1.925e-3
 484 </literallayout>
 485     </para>
 486
 487     <para>
 488      <indexterm><primary>integer</primary></indexterm>
 489      <indexterm><primary>bigint</primary></indexterm>
 490      <indexterm><primary>numeric</primary></indexterm>
 491      A numeric constant that contains neither a decimal point nor an
 492      exponent is initially presumed to be type <type>integer</> if its
 493      value fits in type <type>integer</> (32 bits); otherwise it is
 494      presumed to be type <type>bigint</> if its
 495      value fits in type <type>bigint</> (64 bits); otherwise it is
 496      taken to be type <type>numeric</>.  Constants that contain decimal
 497      points and/or exponents are always initially presumed to be type
 498      <type>numeric</>.
 499     </para>
 500
 501     <para>
 502      The initially assigned data type of a numeric constant is just a
 503      starting point for the type resolution algorithms.  In most cases
 504      the constant will be automatically coerced to the most
 505      appropriate type depending on context.  When necessary, you can
 506      force a numeric value to be interpreted as a specific data type
 507      by casting it.<indexterm><primary>type cast</primary></indexterm>
 508      For example, you can force a numeric value to be treated as type
 509      <type>real</> (<type>float4</>) by writing
 510
 511 <programlisting>
 512 REAL '1.23'  -- string style
 513 1.23::REAL   -- PostgreSQL (historical) style
 514 </programlisting>
 515
 516      These are actually just special cases of the general casting
 517      notations discussed next.
 518     </para>
 519    </sect3>
 520
 521    <sect3 id="sql-syntax-constants-generic">
 522     <title>Constants of Other Types</title>
 523
 524     <indexterm>
 525      <primary>data type</primary>
 526      <secondary>constant</secondary>
 527     </indexterm>
 528
 529     <para>
 530      A constant of an <emphasis>arbitrary</emphasis> type can be
 531      entered using any one of the following notations:
 532 <synopsis>
 533 <replaceable>type</replaceable> '<replaceable>string</replaceable>'
 534 '<replaceable>string</replaceable>'::<replaceable>type</replaceable>
 535 CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 536 </synopsis>
 537      The string constant's text is passed to the input conversion
 538      routine for the type called <replaceable>type</replaceable>. The
 539      result is a constant of the indicated type.  The explicit type
 540      cast may be omitted if there is no ambiguity as to the type the
 541      constant must be (for example, when it is assigned directly to a
 542      table column), in which case it is automatically coerced.
 543     </para>
 544
 545     <para>
 546      The string constant can be written using either regular SQL
 547      notation or dollar-quoting.
 548     </para>
 549
 550     <para>
 551      It is also possible to specify a type coercion using a function-like
 552      syntax:
 553 <synopsis>
 554 <replaceable>typename</replaceable> ( '<replaceable>string</replaceable>' )
 555 </synopsis>
 556      but not all type names may be used in this way; see <xref
 557      linkend="sql-syntax-type-casts"> for details.
 558     </para>
 559
 560     <para>
 561      The <literal>::</literal>, <literal>CAST()</literal>, and
 562      function-call syntaxes can also be used to specify run-time type
 563      conversions of arbitrary expressions, as discussed in <xref
 564      linkend="sql-syntax-type-casts">.  But the form
 565      <literal><replaceable>type</replaceable> '<replaceable>string</replaceable>'</literal>
 566      can only be used to specify the type of a literal constant.
 567      Another restriction on
 568      <literal><replaceable>type</replaceable> '<replaceable>string</replaceable>'</literal>
 569      is that it does not work for array types; use <literal>::</literal>
 570      or <literal>CAST()</literal> to specify the type of an array constant.
 571     </para>
 572
 573     <para>
 574      The <literal>CAST()</> syntax conforms to SQL.  The
 575      <literal><replaceable>type</replaceable> '<replaceable>string</replaceable>'</literal>
 576      syntax is a generalization of the standard: SQL specifies this syntax only
 577      for a few data types, but <productname>PostgreSQL</productname> allows it
 578      for all types.  The syntax with
 579      <literal>::</literal> is historical <productname>PostgreSQL</productname>
 580      usage, as is the function-call syntax.
 581     </para>
 582    </sect3>
 583   </sect2>
 584
 585   <sect2 id="sql-syntax-operators">
 586    <title>Operators</title>
 587
 588    <indexterm zone="sql-syntax-operators">
 589     <primary>operator</primary>
 590     <secondary>syntax</secondary>
 591    </indexterm>
 592
 593    <para>
 594     An operator name is a sequence of up to <symbol>NAMEDATALEN</symbol>-1
 595     (63 by default) characters from the following list:
 596 <literallayout>
 597 + - * / &lt; &gt; = ~ ! @ # % ^ &amp; | ` ?
 598 </literallayout>
 599
 600     There are a few restrictions on operator names, however:
 601     <itemizedlist>
 602      <listitem>
 603       <para>
 604        <literal>--</literal> and <literal>/*</literal> cannot appear
 605        anywhere in an operator name, since they will be taken as the
 606        start of a comment.
 607       </para>
 608      </listitem>
 609
 610      <listitem>
 611       <para>
 612        A multiple-character operator name cannot end in <literal>+</> or <literal>-</>,
 613        unless the name also contains at least one of these characters:
 614 <literallayout>
 615 ~ ! @ # % ^ &amp; | ` ?
 616 </literallayout>
 617        For example, <literal>@-</literal> is an allowed operator name,
 618        but <literal>*-</literal> is not.  This restriction allows
 619        <productname>PostgreSQL</productname> to parse SQL-compliant
 620        queries without requiring spaces between tokens.
 621       </para>
 622      </listitem>
 623     </itemizedlist>
 624    </para>
 625
 626    <para>
 627     When working with non-SQL-standard operator names, you will usually
 628     need to separate adjacent operators with spaces to avoid ambiguity.
 629     For example, if you have defined a left unary operator named <literal>@</literal>,
 630     you cannot write <literal>X*@Y</literal>; you must write
 631     <literal>X* @Y</literal> to ensure that
 632     <productname>PostgreSQL</productname> reads it as two operator names
 633     not one.
 634    </para>
 635   </sect2>
 636
 637   <sect2>
 638    <title>Special Characters</title>
 639
 640   <para>
 641    Some characters that are not alphanumeric have a special meaning
 642    that is different from being an operator.  Details on the usage can
 643    be found at the location where the respective syntax element is
 644    described.  This section only exists to advise the existence and
 645    summarize the purposes of these characters.
 646
 647    <itemizedlist>
 648     <listitem>
 649      <para>
 650       A dollar sign (<literal>$</literal>) followed by digits is used
 651       to represent a positional parameter in the body of a function
 652       definition or a prepared statement.  In other contexts the
 653       dollar sign may be part of an identifier or a dollar-quoted string
 654       constant.
 655      </para>
 656     </listitem>
 657
 658     <listitem>
 659      <para>
 660       Parentheses (<literal>()</literal>) have their usual meaning to
 661       group expressions and enforce precedence.  In some cases
 662       parentheses are required as part of the fixed syntax of a
 663       particular SQL command.
 664      </para>
 665     </listitem>
 666
 667     <listitem>
 668      <para>
 669       Brackets (<literal>[]</literal>) are used to select the elements
 670       of an array.  See <xref linkend="arrays"> for more information
 671       on arrays.
 672      </para>
 673     </listitem>
 674
 675     <listitem>
 676      <para>
 677       Commas (<literal>,</literal>) are used in some syntactical
 678       constructs to separate the elements of a list.
 679      </para>
 680     </listitem>
 681
 682     <listitem>
 683      <para>
 684       The semicolon (<literal>;</literal>) terminates an SQL command.
 685       It cannot appear anywhere within a command, except within a
 686       string constant or quoted identifier.
 687      </para>
 688     </listitem>
 689
 690     <listitem>
 691      <para>
 692       The colon (<literal>:</literal>) is used to select
 693       <quote>slices</quote> from arrays. (See <xref
 694       linkend="arrays">.)  In certain SQL dialects (such as Embedded
 695       SQL), the colon is used to prefix variable names.
 696      </para>
 697     </listitem>
 698
 699     <listitem>
 700      <para>
 701       The asterisk (<literal>*</literal>) is used in some contexts to denote
 702       all the fields of a table row or composite value.  It also
 703       has a special meaning when used as the argument of an
 704       aggregate function, namely that the aggregate does not require
 705       any explicit parameter.
 706      </para>
 707     </listitem>
 708
 709     <listitem>
 710      <para>
 711       The period (<literal>.</literal>) is used in numeric
 712       constants, and to separate schema, table, and column names.
 713      </para>
 714     </listitem>
 715    </itemizedlist>
 716
 717    </para>
 718   </sect2>
 719
 720   <sect2 id="sql-syntax-comments">
 721    <title>Comments</title>
 722
 723    <indexterm zone="sql-syntax-comments">
 724     <primary>comment</primary>
 725     <secondary sortas="SQL">in SQL</secondary>
 726    </indexterm>
 727
 728    <para>
 729     A comment is an arbitrary sequence of characters beginning with
 730     double dashes and extending to the end of the line, e.g.:
 731 <programlisting>
 732 -- This is a standard SQL comment
 733 </programlisting>
 734    </para>
 735
 736    <para>
 737     Alternatively, C-style block comments can be used:
 738 <programlisting>
 739 /* multiline comment
 740  * with nesting: /* nested block comment */
 741  */
 742 </programlisting>
 743     where the comment begins with <literal>/*</literal> and extends to
 744     the matching occurrence of <literal>*/</literal>. These block
 745     comments nest, as specified in the SQL standard but unlike C, so that one can
 746     comment out larger blocks of code that may contain existing block
 747     comments.
 748    </para>
 749
 750    <para>
 751     A comment is removed from the input stream before further syntax
 752     analysis and is effectively replaced by whitespace.
 753    </para>
 754   </sect2>
 755
 756   <sect2 id="sql-precedence">
 757    <title>Lexical Precedence</title>
 758
 759    <indexterm zone="sql-precedence">
 760     <primary>operator</primary>
 761     <secondary>precedence</secondary>
 762    </indexterm>
 763
 764    <para>
 765     <xref linkend="sql-precedence-table"> shows the precedence and
 766     associativity of the operators in <productname>PostgreSQL</>.
 767     Most operators have the same precedence and are left-associative.
 768     The precedence and associativity of the operators is hard-wired
 769     into the parser.  This may lead to non-intuitive behavior; for
 770     example the Boolean operators <literal>&lt;</> and
 771     <literal>&gt;</> have a different precedence than the Boolean
 772     operators <literal>&lt;=</> and <literal>&gt;=</>.  Also, you will
 773     sometimes need to add parentheses when using combinations of
 774     binary and unary operators.  For instance
 775 <programlisting>
 776 SELECT 5 ! - 6;
 777 </programlisting>
 778    will be parsed as
 779 <programlisting>
 780 SELECT 5 ! (- 6);
 781 </programlisting>
 782     because the parser has no idea &mdash; until it is too late
 783     &mdash; that <token>!</token> is defined as a postfix operator,
 784     not an infix one.  To get the desired behavior in this case, you
 785     must write
 786 <programlisting>
 787 SELECT (5 !) - 6;
 788 </programlisting>
 789     This is the price one pays for extensibility.
 790    </para>
 791
 792    <table id="sql-precedence-table">
 793     <title>Operator Precedence (decreasing)</title>
 794
 795     <tgroup cols="3">
 796      <thead>
 797       <row>
 798        <entry>Operator/Element</entry>
 799        <entry>Associativity</entry>
 800        <entry>Description</entry>
 801       </row>
 802      </thead>
 803
 804      <tbody>
 805       <row>
 806        <entry><token>.</token></entry>
 807        <entry>left</entry>
 808        <entry>table/column name separator</entry>
 809       </row>
 810
 811       <row>
 812        <entry><token>::</token></entry>
 813        <entry>left</entry>
 814        <entry><productname>PostgreSQL</productname>-style typecast</entry>
 815       </row>
 816
 817       <row>
 818        <entry><token>[</token> <token>]</token></entry>
 819        <entry>left</entry>
 820        <entry>array element selection</entry>
 821       </row>
 822
 823       <row>
 824        <entry><token>-</token></entry>
 825        <entry>right</entry>
 826        <entry>unary minus</entry>
 827       </row>
 828
 829       <row>
 830        <entry><token>^</token></entry>
 831        <entry>left</entry>
 832        <entry>exponentiation</entry>
 833       </row>
 834
 835       <row>
 836        <entry><token>*</token> <token>/</token> <token>%</token></entry>
 837        <entry>left</entry>
 838        <entry>multiplication, division, modulo</entry>
 839       </row>
 840
 841       <row>
 842        <entry><token>+</token> <token>-</token></entry>
 843        <entry>left</entry>
 844        <entry>addition, subtraction</entry>
 845       </row>
 846
 847       <row>
 848        <entry><token>IS</token></entry>
 849        <entry></entry>
 850        <entry><literal>IS TRUE</>, <literal>IS FALSE</>, <literal>IS UNKNOWN</>, <literal>IS NULL</></entry>
 851       </row>
 852
 853       <row>
 854        <entry><token>ISNULL</token></entry>
 855        <entry></entry>
 856        <entry>test for null</entry>
 857       </row>
 858
 859       <row>
 860        <entry><token>NOTNULL</token></entry>
 861        <entry></entry>
 862        <entry>test for not null</entry>
 863       </row>
 864
 865       <row>
 866        <entry>(any other)</entry>
 867        <entry>left</entry>
 868        <entry>all other native and user-defined operators</entry>
 869       </row>
 870
 871       <row>
 872        <entry><token>IN</token></entry>
 873        <entry></entry>
 874        <entry>set membership</entry>
 875       </row>
 876
 877       <row>
 878        <entry><token>BETWEEN</token></entry>
 879        <entry></entry>
 880        <entry>range containment</entry>
 881       </row>
 882
 883       <row>
 884        <entry><token>OVERLAPS</token></entry>
 885        <entry></entry>
 886        <entry>time interval overlap</entry>
 887       </row>
 888
 889       <row>
 890        <entry><token>LIKE</token> <token>ILIKE</token> <token>SIMILAR</token></entry>
 891        <entry></entry>
 892        <entry>string pattern matching</entry>
 893       </row>
 894
 895       <row>
 896        <entry><token>&lt;</token> <token>&gt;</token></entry>
 897        <entry></entry>
 898        <entry>less than, greater than</entry>
 899       </row>
 900
 901       <row>
 902        <entry><token>=</token></entry>
 903        <entry>right</entry>
 904        <entry>equality, assignment</entry>
 905       </row>
 906
 907       <row>
 908        <entry><token>NOT</token></entry>
 909        <entry>right</entry>
 910        <entry>logical negation</entry>
 911       </row>
 912
 913       <row>
 914        <entry><token>AND</token></entry>
 915        <entry>left</entry>
 916        <entry>logical conjunction</entry>
 917       </row>
 918
 919       <row>
 920        <entry><token>OR</token></entry>
 921        <entry>left</entry>
 922        <entry>logical disjunction</entry>
 923       </row>
 924      </tbody>
 925     </tgroup>
 926    </table>
 927
 928    <para>
 929     Note that the operator precedence rules also apply to user-defined
 930     operators that have the same names as the built-in operators
 931     mentioned above.  For example, if you define a
 932     <quote>+</quote> operator for some custom data type it will have
 933     the same precedence as the built-in <quote>+</quote> operator, no
 934     matter what yours does.
 935    </para>
 936
 937    <para>
 938     When a schema-qualified operator name is used in the
 939     <literal>OPERATOR</> syntax, as for example in
 940 <programlisting>
 941 SELECT 3 OPERATOR(pg_catalog.+) 4;
 942 </programlisting>
 943     the <literal>OPERATOR</> construct is taken to have the default precedence
 944     shown in <xref linkend="sql-precedence-table"> for <quote>any other</> operator.  This is true no matter
 945     which specific operator name appears inside <literal>OPERATOR()</>.
 946    </para>
 947   </sect2>
 948  </sect1>
 949
 950  <sect1 id="sql-expressions">
 951   <title>Value Expressions</title>
 952
 953   <indexterm zone="sql-expressions">
 954    <primary>expression</primary>
 955    <secondary>syntax</secondary>
 956   </indexterm>
 957
 958   <indexterm zone="sql-expressions">
 959    <primary>value expression</primary>
 960   </indexterm>
 961
 962   <indexterm>
 963    <primary>scalar</primary>
 964    <see>expression</see>
 965   </indexterm>
 966
 967   <para>
 968    Value expressions are used in a variety of contexts, such
 969    as in the target list of the <command>SELECT</command> command, as
 970    new column values in <command>INSERT</command> or
 971    <command>UPDATE</command>, or in search conditions in a number of
 972    commands.  The result of a value expression is sometimes called a
 973    <firstterm>scalar</firstterm>, to distinguish it from the result of
 974    a table expression (which is a table).  Value expressions are
 975    therefore also called <firstterm>scalar expressions</firstterm> (or
 976    even simply <firstterm>expressions</firstterm>).  The expression
 977    syntax allows the calculation of values from primitive parts using
 978    arithmetic, logical, set, and other operations.
 979   </para>
 980
 981   <para>
 982    A value expression is one of the following:
 983
 984    <itemizedlist>
 985     <listitem>
 986      <para>
 987       A constant or literal value.
 988      </para>
 989     </listitem>
 990
 991     <listitem>
 992      <para>
 993       A column reference.
 994      </para>
 995     </listitem>
 996
 997     <listitem>
 998      <para>
 999       A positional parameter reference, in the body of a function definition
1000       or prepared statement.
1001      </para>
1002     </listitem>
1003
1004     <listitem>
1005      <para>
1006       A subscripted expression.
1007      </para>
1008     </listitem>
1009
1010     <listitem>
1011      <para>
1012       A field selection expression.
1013      </para>
1014     </listitem>
1015
1016     <listitem>
1017      <para>
1018       An operator invocation.
1019      </para>
1020     </listitem>
1021
1022     <listitem>
1023      <para>
1024       A function call.
1025      </para>
1026     </listitem>
1027
1028     <listitem>
1029      <para>
1030       An aggregate expression.
1031      </para>
1032     </listitem>
1033
1034     <listitem>
1035      <para>
1036       A type cast.
1037      </para>
1038     </listitem>
1039
1040     <listitem>
1041      <para>
1042       A scalar subquery.
1043      </para>
1044     </listitem>
1045
1046     <listitem>
1047      <para>
1048       An array constructor.
1049      </para>
1050     </listitem>
1051
1052     <listitem>
1053      <para>
1054       A row constructor.
1055      </para>
1056     </listitem>
1057
1058     <listitem>
1059      <para>
1060       Another value expression in parentheses, useful to group
1061       subexpressions and override
1062       precedence.<indexterm><primary>parenthesis</></>
1063      </para>
1064     </listitem>
1065    </itemizedlist>
1066   </para>
1067
1068   <para>
1069    In addition to this list, there are a number of constructs that can
1070    be classified as an expression but do not follow any general syntax
1071    rules.  These generally have the semantics of a function or
1072    operator and are explained in the appropriate location in <xref
1073    linkend="functions">.  An example is the <literal>IS NULL</literal>
1074    clause.
1075   </para>
1076
1077   <para>
1078    We have already discussed constants in <xref
1079    linkend="sql-syntax-constants">.  The following sections discuss
1080    the remaining options.
1081   </para>
1082
1083   <sect2>
1084    <title>Column References</title>
1085
1086    <indexterm>
1087     <primary>column reference</primary>
1088    </indexterm>
1089
1090    <para>
1091     A column can be referenced in the form
1092 <synopsis>
1093 <replaceable>correlation</replaceable>.<replaceable>columnname</replaceable>
1094 </synopsis>
1095    </para>
1096
1097    <para>
1098     <replaceable>correlation</replaceable> is the name of a
1099     table (possibly qualified with a schema name), or an alias for a table
1100     defined by means of a <literal>FROM</literal> clause, or one of
1101     the key words <literal>NEW</literal> or <literal>OLD</literal>.
1102     (<literal>NEW</literal> and <literal>OLD</literal> can only appear in rewrite rules,
1103     while other correlation names can be used in any SQL statement.)
1104     The correlation name and separating dot may be omitted if the column name
1105     is unique across all the tables being used in the current query.  (See also <xref linkend="queries">.)
1106    </para>
1107   </sect2>
1108
1109   <sect2>
1110    <title>Positional Parameters</title>
1111
1112    <indexterm>
1113     <primary>parameter</primary>
1114     <secondary>syntax</secondary>
1115    </indexterm>
1116
1117    <indexterm>
1118     <primary>$</primary>
1119    </indexterm>
1120
1121    <para>
1122     A positional parameter reference is used to indicate a value
1123     that is supplied externally to an SQL statement.  Parameters are
1124     used in SQL function definitions and in prepared queries.  Some
1125     client libraries also support specifying data values separately
1126     from the SQL command string, in which case parameters are used to
1127     refer to the out-of-line data values.
1128     The form of a parameter reference is:
1129 <synopsis>
1130 $<replaceable>number</replaceable>
1131 </synopsis>
1132    </para>
1133
1134    <para>
1135     For example, consider the definition of a function,
1136     <function>dept</function>, as
1137
1138 <programlisting>
1139 CREATE FUNCTION dept(text) RETURNS dept
1140     AS $$ SELECT * FROM dept WHERE name = $1 $$
1141     LANGUAGE SQL;
1142 </programlisting>
1143
1144     Here the <literal>$1</literal> references the value of the first
1145     function argument whenever the function is invoked.
1146    </para>
1147   </sect2>
1148
1149   <sect2>
1150    <title>Subscripts</title>
1151
1152    <indexterm>
1153     <primary>subscript</primary>
1154    </indexterm>
1155
1156    <para>
1157     If an expression yields a value of an array type, then a specific
1158     element of the array value can be extracted by writing
1159 <synopsis>
1160 <replaceable>expression</replaceable>[<replaceable>subscript</replaceable>]
1161 </synopsis>
1162     or multiple adjacent elements (an <quote>array slice</>) can be extracted
1163     by writing
1164 <synopsis>
1165 <replaceable>expression</replaceable>[<replaceable>lower_subscript</replaceable>:<replaceable>upper_subscript</replaceable>]
1166 </synopsis>
1167     (Here, the brackets <literal>[ ]</literal> are meant to appear literally.)
1168     Each <replaceable>subscript</replaceable> is itself an expression,
1169     which must yield an integer value.
1170    </para>
1171
1172    <para>
1173     In general the array <replaceable>expression</replaceable> must be
1174     parenthesized, but the parentheses may be omitted when the expression
1175     to be subscripted is just a column reference or positional parameter.
1176     Also, multiple subscripts can be concatenated when the original array
1177     is multidimensional.
1178     For example,
1179
1180 <programlisting>
1181 mytable.arraycolumn[4]
1182 mytable.two_d_column[17][34]
1183 $1[10:42]
1184 (arrayfunction(a,b))[42]
1185 </programlisting>
1186
1187     The parentheses in the last example are required.
1188     See <xref linkend="arrays"> for more about arrays.
1189    </para>
1190   </sect2>
1191
1192   <sect2>
1193    <title>Field Selection</title>
1194
1195    <indexterm>
1196     <primary>field selection</primary>
1197    </indexterm>
1198
1199    <para>
1200     If an expression yields a value of a composite type (row type), then a
1201     specific field of the row can be extracted by writing
1202 <synopsis>
1203 <replaceable>expression</replaceable>.<replaceable>fieldname</replaceable>
1204 </synopsis>
1205    </para>
1206
1207    <para>
1208     In general the row <replaceable>expression</replaceable> must be
1209     parenthesized, but the parentheses may be omitted when the expression
1210     to be selected from is just a table reference or positional parameter.
1211     For example,
1212
1213 <programlisting>
1214 mytable.mycolumn
1215 $1.somecolumn
1216 (rowfunction(a,b)).col3
1217 </programlisting>
1218
1219     (Thus, a qualified column reference is actually just a special case
1220     of the field selection syntax.)
1221    </para>
1222   </sect2>
1223
1224   <sect2>
1225    <title>Operator Invocations</title>
1226
1227    <indexterm>
1228     <primary>operator</primary>
1229     <secondary>invocation</secondary>
1230    </indexterm>
1231
1232    <para>
1233     There are three possible syntaxes for an operator invocation:
1234     <simplelist>
1235      <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> <replaceable>expression</replaceable> (binary infix operator)</member>
1236      <member><replaceable>operator</replaceable> <replaceable>expression</replaceable> (unary prefix operator)</member>
1237      <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> (unary postfix operator)</member>
1238     </simplelist>
1239     where the <replaceable>operator</replaceable> token follows the syntax
1240     rules of <xref linkend="sql-syntax-operators">, or is one of the
1241     key words <token>AND</token>, <token>OR</token>, and
1242     <token>NOT</token>, or is a qualified operator name in the form
1243 <synopsis>
1244 <literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
1245 </synopsis>
1246     Which particular operators exist and whether
1247     they are unary or binary depends on what operators have been
1248     defined by the system or the user.  <xref linkend="functions">
1249     describes the built-in operators.
1250    </para>
1251   </sect2>
1252
1253   <sect2>
1254    <title>Function Calls</title>
1255
1256    <indexterm>
1257     <primary>function</primary>
1258     <secondary>invocation</secondary>
1259    </indexterm>
1260
1261    <para>
1262     The syntax for a function call is the name of a function
1263     (possibly qualified with a schema name), followed by its argument list
1264     enclosed in parentheses:
1265
1266 <synopsis>
1267 <replaceable>function</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
1268 </synopsis>
1269    </para>
1270
1271    <para>
1272     For example, the following computes the square root of 2:
1273 <programlisting>
1274 sqrt(2)
1275 </programlisting>
1276    </para>
1277
1278    <para>
1279     The list of built-in functions is in <xref linkend="functions">.
1280     Other functions may be added by the user.
1281    </para>
1282   </sect2>
1283
1284   <sect2 id="syntax-aggregates">
1285    <title>Aggregate Expressions</title>
1286
1287    <indexterm zone="syntax-aggregates">
1288     <primary>aggregate function</primary>
1289     <secondary>invocation</secondary>
1290    </indexterm>
1291
1292    <para>
1293     An <firstterm>aggregate expression</firstterm> represents the
1294     application of an aggregate function across the rows selected by a
1295     query.  An aggregate function reduces multiple inputs to a single
1296     output value, such as the sum or average of the inputs.  The
1297     syntax of an aggregate expression is one of the following:
1298
1299 <synopsis>
1300 <replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
1301 <replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
1302 <replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable> [ , ... ] )
1303 <replaceable>aggregate_name</replaceable> ( * )
1304 </synopsis>
1305
1306     where <replaceable>aggregate_name</replaceable> is a previously
1307     defined aggregate (possibly qualified with a schema name), and
1308     <replaceable>expression</replaceable> is
1309     any value expression that does not itself contain an aggregate
1310     expression.
1311    </para>
1312
1313    <para>
1314     The first form of aggregate expression invokes the aggregate
1315     across all input rows for which the given expression(s) yield
1316     non-null values.  (Actually, it is up to the aggregate function
1317     whether to ignore null values or not &mdash; but all the standard ones do.)
1318     The second form is the same as the first, since
1319     <literal>ALL</literal> is the default.  The third form invokes the
1320     aggregate for all distinct non-null values of the expressions found
1321     in the input rows.  The last form invokes the aggregate once for
1322     each input row regardless of null or non-null values; since no
1323     particular input value is specified, it is generally only useful
1324     for the <function>count(*)</function> aggregate function.
1325    </para>
1326
1327    <para>
1328     For example, <literal>count(*)</literal> yields the total number
1329     of input rows; <literal>count(f1)</literal> yields the number of
1330     input rows in which <literal>f1</literal> is non-null;
1331     <literal>count(distinct f1)</literal> yields the number of
1332     distinct non-null values of <literal>f1</literal>.
1333    </para>
1334
1335    <para>
1336     The predefined aggregate functions are described in <xref
1337     linkend="functions-aggregate">.  Other aggregate functions may be added
1338     by the user.
1339    </para>
1340
1341    <para>
1342     An aggregate expression may only appear in the result list or
1343     <literal>HAVING</> clause of a <command>SELECT</> command.
1344     It is forbidden in other clauses, such as <literal>WHERE</>,
1345     because those clauses are logically evaluated before the results
1346     of aggregates are formed.
1347    </para>
1348
1349    <para>
1350     When an aggregate expression appears in a subquery (see
1351     <xref linkend="sql-syntax-scalar-subqueries"> and
1352     <xref linkend="functions-subquery">), the aggregate is normally
1353     evaluated over the rows of the subquery.  But an exception occurs
1354     if the aggregate's arguments contain only outer-level variables:
1355     the aggregate then belongs to the nearest such outer level, and is
1356     evaluated over the rows of that query.  The aggregate expression
1357     as a whole is then an outer reference for the subquery it appears in,
1358     and acts as a constant over any one evaluation of that subquery.
1359     The restriction about
1360     appearing only in the result list or <literal>HAVING</> clause
1361     applies with respect to the query level that the aggregate belongs to.
1362    </para>
1363
1364    <note>
1365     <para>
1366      <productname>PostgreSQL</productname> currently does not support
1367      <literal>DISTINCT</> with more than one input expression.
1368     </para>
1369    </note>
1370   </sect2>
1371
1372   <sect2 id="sql-syntax-type-casts">
1373    <title>Type Casts</title>
1374
1375    <indexterm>
1376     <primary>data type</primary>
1377     <secondary>type cast</secondary>
1378    </indexterm>
1379
1380    <indexterm>
1381     <primary>type cast</primary>
1382    </indexterm>
1383
1384    <para>
1385     A type cast specifies a conversion from one data type to another.
1386     <productname>PostgreSQL</productname> accepts two equivalent syntaxes
1387     for type casts:
1388 <synopsis>
1389 CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable> )
1390 <replaceable>expression</replaceable>::<replaceable>type</replaceable>
1391 </synopsis>
1392     The <literal>CAST</> syntax conforms to SQL; the syntax with
1393     <literal>::</literal> is historical <productname>PostgreSQL</productname>
1394     usage.
1395    </para>
1396
1397    <para>
1398     When a cast is applied to a value expression of a known type, it
1399     represents a run-time type conversion.  The cast will succeed only
1400     if a suitable type conversion operation has been defined.  Notice that this
1401     is subtly different from the use of casts with constants, as shown in
1402     <xref linkend="sql-syntax-constants-generic">.  A cast applied to an
1403     unadorned string literal represents the initial assignment of a type
1404     to a literal constant value, and so it will succeed for any type
1405     (if the contents of the string literal are acceptable input syntax for the
1406     data type).
1407    </para>
1408
1409    <para>
1410     An explicit type cast may usually be omitted if there is no ambiguity as
1411     to the type that a value expression must produce (for example, when it is
1412     assigned to a table column); the system will automatically apply a
1413     type cast in such cases.  However, automatic casting is only done for
1414     casts that are marked <quote>OK to apply implicitly</>
1415     in the system catalogs.  Other casts must be invoked with
1416     explicit casting syntax.  This restriction is intended to prevent
1417     surprising conversions from being applied silently.
1418    </para>
1419
1420    <para>
1421     It is also possible to specify a type cast using a function-like
1422     syntax:
1423 <synopsis>
1424 <replaceable>typename</replaceable> ( <replaceable>expression</replaceable> )
1425 </synopsis>
1426     However, this only works for types whose names are also valid as
1427     function names.  For example, <literal>double precision</literal>
1428     can't be used this way, but the equivalent <literal>float8</literal>
1429     can.  Also, the names <literal>interval</>, <literal>time</>, and
1430     <literal>timestamp</> can only be used in this fashion if they are
1431     double-quoted, because of syntactic conflicts.  Therefore, the use of
1432     the function-like cast syntax leads to inconsistencies and should
1433     probably be avoided in new applications.
1434
1435     (The function-like syntax is in fact just a function call.  When
1436     one of the two standard cast syntaxes is used to do a run-time
1437     conversion, it will internally invoke a registered function to
1438     perform the conversion.  By convention, these conversion functions
1439     have the same name as their output type, and thus the <quote>function-like
1440     syntax</> is nothing more than a direct invocation of the underlying
1441     conversion function.  Obviously, this is not something that a portable
1442     application should rely on.)
1443    </para>
1444   </sect2>
1445
1446   <sect2 id="sql-syntax-scalar-subqueries">
1447    <title>Scalar Subqueries</title>
1448
1449    <indexterm>
1450     <primary>subquery</primary>
1451    </indexterm>
1452
1453    <para>
1454     A scalar subquery is an ordinary
1455     <command>SELECT</command> query in parentheses that returns exactly one
1456     row with one column.  (See <xref linkend="queries"> for information about writing queries.)
1457     The <command>SELECT</command> query is executed
1458     and the single returned value is used in the surrounding value expression.
1459     It is an error to use a query that
1460     returns more than one row or more than one column as a scalar subquery.
1461     (But if, during a particular execution, the subquery returns no rows,
1462     there is no error; the scalar result is taken to be null.)
1463     The subquery can refer to variables from the surrounding query,
1464     which will act as constants during any one evaluation of the subquery.
1465     See also <xref linkend="functions-subquery"> for other expressions involving subqueries.
1466    </para>
1467
1468    <para>
1469     For example, the following finds the largest city population in each
1470     state:
1471 <programlisting>
1472 SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
1473     FROM states;
1474 </programlisting>
1475    </para>
1476   </sect2>
1477
1478   <sect2 id="sql-syntax-array-constructors">
1479    <title>Array Constructors</title>
1480
1481    <indexterm>
1482     <primary>array</primary>
1483     <secondary>constructor</secondary>
1484    </indexterm>
1485
1486    <indexterm>
1487     <primary>ARRAY</primary>
1488    </indexterm>
1489
1490    <para>
1491     An array constructor is an expression that builds an
1492     array value from values for its member elements.  A simple array
1493     constructor
1494     consists of the key word <literal>ARRAY</literal>, a left square bracket
1495     <literal>[</>, one or more expressions (separated by commas) for the
1496     array element values, and finally a right square bracket <literal>]</>.
1497     For example,
1498 <programlisting>
1499 SELECT ARRAY[1,2,3+4];
1500   array
1501 ---------
1502  {1,2,7}
1503 (1 row)
1504 </programlisting>
1505     The array element type is the common type of the member expressions,
1506     determined using the same rules as for <literal>UNION</> or
1507     <literal>CASE</> constructs (see <xref linkend="typeconv-union-case">).
1508    </para>
1509
1510    <para>
1511     Multidimensional array values can be built by nesting array
1512     constructors.
1513     In the inner constructors, the key word <literal>ARRAY</literal> may
1514     be omitted.  For example, these produce the same result:
1515
1516 <programlisting>
1517 SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
1518      array
1519 ---------------
1520  {{1,2},{3,4}}
1521 (1 row)
1522
1523 SELECT ARRAY[[1,2],[3,4]];
1524      array
1525 ---------------
1526  {{1,2},{3,4}}
1527 (1 row)
1528 </programlisting>
1529
1530     Since multidimensional arrays must be rectangular, inner constructors
1531     at the same level must produce sub-arrays of identical dimensions.
1532   </para>
1533
1534   <para>
1535     Multidimensional array constructor elements can be anything yielding
1536     an array of the proper kind, not only a sub-<literal>ARRAY</> construct.
1537     For example:
1538 <programlisting>
1539 CREATE TABLE arr(f1 int[], f2 int[]);
1540
1541 INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);
1542
1543 SELECT ARRAY[f1, f2, '{{9,10},{11,12}}'::int[]] FROM arr;
1544                      array
1545 ------------------------------------------------
1546  {{{1,2},{3,4}},{{5,6},{7,8}},{{9,10},{11,12}}}
1547 (1 row)
1548 </programlisting>
1549   </para>
1550
1551   <para>
1552    It is also possible to construct an array from the results of a
1553    subquery.  In this form, the array constructor is written with the
1554    key word <literal>ARRAY</literal> followed by a parenthesized (not
1555    bracketed) subquery. For example:
1556 <programlisting>
1557 SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
1558                           ?column?
1559 -------------------------------------------------------------
1560  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31}
1561 (1 row)
1562 </programlisting>
1563    The subquery must return a single column. The resulting
1564    one-dimensional array will have an element for each row in the
1565    subquery result, with an element type matching that of the
1566    subquery's output column.
1567   </para>
1568
1569   <para>
1570    The subscripts of an array value built with <literal>ARRAY</literal>
1571    always begin with one.  For more information about arrays, see
1572    <xref linkend="arrays">.
1573   </para>
1574
1575   </sect2>
1576
1577   <sect2 id="sql-syntax-row-constructors">
1578    <title>Row Constructors</title>
1579
1580    <indexterm>
1581     <primary>composite type</primary>
1582     <secondary>constructor</secondary>
1583    </indexterm>
1584
1585    <indexterm>
1586     <primary>row type</primary>
1587     <secondary>constructor</secondary>
1588    </indexterm>
1589
1590    <indexterm>
1591     <primary>ROW</primary>
1592    </indexterm>
1593
1594    <para>
1595     A row constructor is an expression that builds a row value (also
1596     called a composite value) from values
1597     for its member fields.  A row constructor consists of the key word
1598     <literal>ROW</literal>, a left parenthesis, zero or more
1599     expressions (separated by commas) for the row field values, and finally
1600     a right parenthesis.  For example,
1601 <programlisting>
1602 SELECT ROW(1,2.5,'this is a test');
1603 </programlisting>
1604     The key word <literal>ROW</> is optional when there is more than one
1605     expression in the list.
1606    </para>
1607
1608    <para>
1609     A row constructor can include the syntax
1610     <replaceable>rowvalue</replaceable><literal>.*</literal>,
1611     which will be expanded to a list of the elements of the row value,
1612     just as occurs when the <literal>.*</> syntax is used at the top level
1613     of a <command>SELECT</> list.  For example, if table <literal>t</> has
1614     columns <literal>f1</> and <literal>f2</>, these are the same:
1615 <programlisting>
1616 SELECT ROW(t.*, 42) FROM t;
1617 SELECT ROW(t.f1, t.f2, 42) FROM t;
1618 </programlisting>
1619    </para>
1620
1621    <note>
1622     <para>
1623      Before <productname>PostgreSQL</productname> 8.2, the
1624      <literal>.*</literal> syntax was not expanded, so that writing
1625      <literal>ROW(t.*, 42)</> created a two-field row whose first field
1626      was another row value.  The new behavior is usually more useful.
1627      If you need the old behavior of nested row values, write the inner
1628      row value without <literal>.*</literal>, for instance
1629      <literal>ROW(t, 42)</>.
1630     </para>
1631    </note>
1632
1633    <para>
1634     By default, the value created by a <literal>ROW</> expression is of
1635     an anonymous record type.  If necessary, it can be cast to a named
1636     composite type &mdash; either the row type of a table, or a composite type
1637     created with <command>CREATE TYPE AS</>.  An explicit cast may be needed
1638     to avoid ambiguity.  For example:
1639 <programlisting>
1640 CREATE TABLE mytable(f1 int, f2 float, f3 text);
1641
1642 CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
1643
1644 -- No cast needed since only one getf1() exists
1645 SELECT getf1(ROW(1,2.5,'this is a test'));
1646  getf1
1647 -------
1648      1
1649 (1 row)
1650
1651 CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);
1652
1653 CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
1654
1655 -- Now we need a cast to indicate which function to call:
1656 SELECT getf1(ROW(1,2.5,'this is a test'));
1657 ERROR:  function getf1(record) is not unique
1658
1659 SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
1660  getf1
1661 -------
1662      1
1663 (1 row)
1664
1665 SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
1666  getf1
1667 -------
1668     11
1669 (1 row)
1670 </programlisting>
1671   </para>
1672
1673   <para>
1674    Row constructors can be used to build composite values to be stored
1675    in a composite-type table column, or to be passed to a function that
1676    accepts a composite parameter.  Also,
1677    it is possible to compare two row values or test a row with
1678    <literal>IS NULL</> or <literal>IS NOT NULL</>, for example
1679 <programlisting>
1680 SELECT ROW(1,2.5,'this is a test') = ROW(1, 3, 'not the same');
1681
1682 SELECT ROW(table.*) IS NULL FROM table;  -- detect all-null rows
1683 </programlisting>
1684    For more detail see <xref linkend="functions-comparisons">.
1685    Row constructors can also be used in connection with subqueries,
1686    as discussed in <xref linkend="functions-subquery">.
1687   </para>
1688
1689   </sect2>
1690
1691   <sect2 id="syntax-express-eval">
1692    <title>Expression Evaluation Rules</title>
1693
1694    <indexterm>
1695     <primary>expression</primary>
1696     <secondary>order of evaluation</secondary>
1697    </indexterm>
1698
1699    <para>
1700     The order of evaluation of subexpressions is not defined.  In
1701     particular, the inputs of an operator or function are not necessarily
1702     evaluated left-to-right or in any other fixed order.
1703    </para>
1704
1705    <para>
1706     Furthermore, if the result of an expression can be determined by
1707     evaluating only some parts of it, then other subexpressions
1708     might not be evaluated at all.  For instance, if one wrote
1709 <programlisting>
1710 SELECT true OR somefunc();
1711 </programlisting>
1712     then <literal>somefunc()</literal> would (probably) not be called
1713     at all. The same would be the case if one wrote
1714 <programlisting>
1715 SELECT somefunc() OR true;
1716 </programlisting>
1717     Note that this is not the same as the left-to-right
1718     <quote>short-circuiting</quote> of Boolean operators that is found
1719     in some programming languages.
1720    </para>
1721
1722    <para>
1723     As a consequence, it is unwise to use functions with side effects
1724     as part of complex expressions.  It is particularly dangerous to
1725     rely on side effects or evaluation order in <literal>WHERE</> and <literal>HAVING</> clauses,
1726     since those clauses are extensively reprocessed as part of
1727     developing an execution plan.  Boolean
1728     expressions (<literal>AND</>/<literal>OR</>/<literal>NOT</> combinations) in those clauses may be reorganized
1729     in any manner allowed by the laws of Boolean algebra.
1730    </para>
1731
1732    <para>
1733     When it is essential to force evaluation order, a <literal>CASE</>
1734     construct (see <xref linkend="functions-conditional">) may be
1735     used.  For example, this is an untrustworthy way of trying to
1736     avoid division by zero in a <literal>WHERE</> clause:
1737 <programlisting>
1738 SELECT ... WHERE x &lt;&gt; 0 AND y/x &gt; 1.5;
1739 </programlisting>
1740     But this is safe:
1741 <programlisting>
1742 SELECT ... WHERE CASE WHEN x &lt;&gt; 0 THEN y/x &gt; 1.5 ELSE false END;
1743 </programlisting>
1744     A <literal>CASE</> construct used in this fashion will defeat optimization
1745     attempts, so it should only be done when necessary.  (In this particular
1746     example, it would doubtless be best to sidestep the problem by writing
1747     <literal>y &gt; 1.5*x</> instead.)
1748    </para>
1749   </sect2>
1750  </sect1>
1751
1752 </chapter>