granicus.if.org Git - postgresql/blob - doc/src/sgml/xfunc.sgml

   1 <!-- $PostgreSQL: pgsql/doc/src/sgml/xfunc.sgml,v 1.151 2010/07/26 20:14:05 petere Exp $ -->
   2
   3  <sect1 id="xfunc">
   4   <title>User-Defined Functions</title>
   5
   6   <indexterm zone="xfunc">
   7    <primary>function</primary>
   8    <secondary>user-defined</secondary>
   9   </indexterm>
  10
  11   <para>
  12    <productname>PostgreSQL</productname> provides four kinds of
  13    functions:
  14
  15    <itemizedlist>
  16     <listitem>
  17      <para>
  18       query language functions (functions written in
  19       <acronym>SQL</acronym>) (<xref linkend="xfunc-sql">)
  20      </para>
  21     </listitem>
  22     <listitem>
  23      <para>
  24       procedural language functions (functions written in, for
  25       example, <application>PL/pgSQL</> or <application>PL/Tcl</>)
  26       (<xref linkend="xfunc-pl">)
  27      </para>
  28     </listitem>
  29     <listitem>
  30      <para>
  31       internal functions (<xref linkend="xfunc-internal">)
  32      </para>
  33     </listitem>
  34     <listitem>
  35      <para>
  36       C-language functions (<xref linkend="xfunc-c">)
  37      </para>
  38     </listitem>
  39    </itemizedlist>
  40   </para>
  41
  42   <para>
  43    Every kind
  44    of  function  can take base types, composite types, or
  45    combinations of these as arguments (parameters). In addition,
  46    every kind of function can return a base type or
  47    a composite type.  Functions can also be defined to return
  48    sets of base or composite values.
  49   </para>
  50
  51   <para>
  52    Many kinds of functions can take or return certain pseudo-types
  53    (such as polymorphic types), but the available facilities vary.
  54    Consult the description of each kind of function for more details.
  55   </para>
  56
  57   <para>
  58    It's easiest to define <acronym>SQL</acronym>
  59    functions, so we'll start by discussing those.
  60    Most of the concepts presented for <acronym>SQL</acronym> functions
  61    will carry over to the other types of functions.
  62   </para>
  63
  64   <para>
  65    Throughout this chapter, it can be useful to look at the reference
  66    page of the <xref linkend="sql-createfunction"> command to
  67    understand the examples better.  Some examples from this chapter
  68    can be found in <filename>funcs.sql</filename> and
  69    <filename>funcs.c</filename> in the <filename>src/tutorial</>
  70    directory in the <productname>PostgreSQL</productname> source
  71    distribution.
  72   </para>
  73   </sect1>
  74
  75   <sect1 id="xfunc-sql">
  76    <title>Query Language (<acronym>SQL</acronym>) Functions</title>
  77
  78    <indexterm zone="xfunc-sql">
  79     <primary>function</primary>
  80     <secondary>user-defined</secondary>
  81     <tertiary>in SQL</tertiary>
  82    </indexterm>
  83
  84    <para>
  85     SQL functions execute an arbitrary list of SQL statements, returning
  86     the result of the last query in the list.
  87     In the simple (non-set)
  88     case, the first row of the last query's result will be returned.
  89     (Bear in mind that <quote>the first row</quote> of a multirow
  90     result is not well-defined unless you use <literal>ORDER BY</>.)
  91     If the last query happens
  92     to return no rows at all, the null value will be returned.
  93    </para>
  94
  95    <para>
  96     Alternatively, an SQL function can be declared to return a set,
  97     by specifying the function's return type as <literal>SETOF
  98     <replaceable>sometype</></literal>, or equivalently by declaring it as
  99     <literal>RETURNS TABLE(<replaceable>columns</>)</literal>.  In this case
 100     all rows of the last query's result are returned.  Further details appear
 101     below.
 102    </para>
 103
 104    <para>
 105     The body of an SQL function must be a list of SQL
 106     statements separated by semicolons.  A semicolon after the last
 107     statement is optional.  Unless the function is declared to return
 108     <type>void</>, the last statement must be a <command>SELECT</>,
 109     or an <command>INSERT</>, <command>UPDATE</>, or <command>DELETE</>
 110     that has a <literal>RETURNING</> clause.
 111    </para>
 112
 113     <para>
 114      Any collection of commands in the  <acronym>SQL</acronym>
 115      language can be packaged together and defined as a function.
 116      Besides <command>SELECT</command> queries, the commands can include data
 117      modification queries (<command>INSERT</command>,
 118      <command>UPDATE</command>, and <command>DELETE</command>), as well as
 119      other SQL commands. (The only exception is that you cannot put
 120      <command>BEGIN</>, <command>COMMIT</>, <command>ROLLBACK</>, or
 121      <command>SAVEPOINT</> commands into a <acronym>SQL</acronym> function.)
 122      However, the final command
 123      must be a <command>SELECT</command> or have a <literal>RETURNING</>
 124      clause that returns whatever is
 125      specified as the function's return type.  Alternatively, if you
 126      want to define a SQL function that performs actions but has no
 127      useful value to return, you can define it as returning <type>void</>.
 128      For example, this function removes rows with negative salaries from
 129      the <literal>emp</> table:
 130
 131 <screen>
 132 CREATE FUNCTION clean_emp() RETURNS void AS '
 133     DELETE FROM emp
 134         WHERE salary &lt; 0;
 135 ' LANGUAGE SQL;
 136
 137 SELECT clean_emp();
 138
 139  clean_emp
 140 -----------
 141
 142 (1 row)
 143 </screen>
 144     </para>
 145
 146    <para>
 147     The syntax of the <command>CREATE FUNCTION</command> command requires
 148     the function body to be written as a string constant.  It is usually
 149     most convenient to use dollar quoting (see <xref
 150     linkend="sql-syntax-dollar-quoting">) for the string constant.
 151     If you choose to use regular single-quoted string constant syntax,
 152     you must double single quote marks (<literal>'</>) and backslashes
 153     (<literal>\</>) (assuming escape string syntax) in the body of
 154     the function (see <xref linkend="sql-syntax-strings">).
 155    </para>
 156
 157    <para>
 158     Arguments to the SQL function are referenced in the function
 159     body using the syntax <literal>$<replaceable>n</></>: <literal>$1</>
 160     refers to the first argument, <literal>$2</> to the second, and so on.
 161     If an argument is of a composite type, then the dot notation,
 162     e.g., <literal>$1.name</literal>, can be used to access attributes
 163     of the argument.  The arguments can only be used as data values,
 164     not as identifiers.  Thus for example this is reasonable:
 165 <programlisting>
 166 INSERT INTO mytable VALUES ($1);
 167 </programlisting>
 168 but this will not work:
 169 <programlisting>
 170 INSERT INTO $1 VALUES (42);
 171 </programlisting>
 172    </para>
 173
 174    <sect2 id="xfunc-sql-base-functions">
 175     <title><acronym>SQL</acronym> Functions on Base Types</title>
 176
 177     <para>
 178      The simplest possible <acronym>SQL</acronym> function has no arguments and
 179      simply returns a base type, such as <type>integer</type>:
 180
 181 <screen>
 182 CREATE FUNCTION one() RETURNS integer AS $$
 183     SELECT 1 AS result;
 184 $$ LANGUAGE SQL;
 185
 186 -- Alternative syntax for string literal:
 187 CREATE FUNCTION one() RETURNS integer AS '
 188     SELECT 1 AS result;
 189 ' LANGUAGE SQL;
 190
 191 SELECT one();
 192
 193  one
 194 -----
 195    1
 196 </screen>
 197     </para>
 198
 199     <para>
 200      Notice that we defined a column alias within the function body for the result of the function
 201      (with  the  name <literal>result</>),  but this column alias is not visible
 202      outside the function.  Hence,  the  result  is labeled <literal>one</>
 203      instead of <literal>result</>.
 204     </para>
 205
 206     <para>
 207      It is almost as easy to define <acronym>SQL</acronym> functions
 208      that take base types as arguments.  In the example below, notice
 209      how we refer to the arguments within the function as <literal>$1</>
 210      and <literal>$2</>.
 211
 212 <screen>
 213 CREATE FUNCTION add_em(integer, integer) RETURNS integer AS $$
 214     SELECT $1 + $2;
 215 $$ LANGUAGE SQL;
 216
 217 SELECT add_em(1, 2) AS answer;
 218
 219  answer
 220 --------
 221       3
 222 </screen>
 223     </para>
 224
 225     <para>
 226      Here is a more useful function, which might be used to debit a
 227      bank account:
 228
 229 <programlisting>
 230 CREATE FUNCTION tf1 (integer, numeric) RETURNS integer AS $$
 231     UPDATE bank
 232         SET balance = balance - $2
 233         WHERE accountno = $1;
 234     SELECT 1;
 235 $$ LANGUAGE SQL;
 236 </programlisting>
 237
 238      A user could execute this function to debit account 17 by $100.00 as
 239      follows:
 240
 241 <programlisting>
 242 SELECT tf1(17, 100.0);
 243 </programlisting>
 244     </para>
 245
 246     <para>
 247      In practice one would probably like a more useful result from the
 248      function than a constant 1, so a more likely definition
 249      is:
 250
 251 <programlisting>
 252 CREATE FUNCTION tf1 (integer, numeric) RETURNS numeric AS $$
 253     UPDATE bank
 254         SET balance = balance - $2
 255         WHERE accountno = $1;
 256     SELECT balance FROM bank WHERE accountno = $1;
 257 $$ LANGUAGE SQL;
 258 </programlisting>
 259
 260      which adjusts the balance and returns the new balance.
 261      The same thing could be done in one command using <literal>RETURNING</>:
 262
 263 <programlisting>
 264 CREATE FUNCTION tf1 (integer, numeric) RETURNS numeric AS $$
 265     UPDATE bank
 266         SET balance = balance - $2
 267         WHERE accountno = $1
 268     RETURNING balance;
 269 $$ LANGUAGE SQL;
 270 </programlisting>
 271     </para>
 272    </sect2>
 273
 274    <sect2>
 275     <title><acronym>SQL</acronym> Functions on Composite Types</title>
 276
 277     <para>
 278      When writing  functions with arguments of composite
 279      types, we must  not  only  specify  which
 280      argument  we  want (as we did above with <literal>$1</> and <literal>$2</literal>) but
 281      also the desired attribute (field) of  that  argument.   For  example,
 282      suppose that
 283      <type>emp</type> is a table containing employee data, and therefore
 284      also the name of the composite type of each row of the table.  Here
 285      is a function <function>double_salary</function> that computes what someone's
 286      salary would be if it were doubled:
 287
 288 <screen>
 289 CREATE TABLE emp (
 290     name        text,
 291     salary      numeric,
 292     age         integer,
 293     cubicle     point
 294 );
 295
 296 INSERT INTO emp VALUES ('Bill', 4200, 45, '(2,1)');
 297
 298 CREATE FUNCTION double_salary(emp) RETURNS numeric AS $$
 299     SELECT $1.salary * 2 AS salary;
 300 $$ LANGUAGE SQL;
 301
 302 SELECT name, double_salary(emp.*) AS dream
 303     FROM emp
 304     WHERE emp.cubicle ~= point '(2,1)';
 305
 306  name | dream
 307 ------+-------
 308  Bill |  8400
 309 </screen>
 310     </para>
 311
 312     <para>
 313      Notice the use of the syntax <literal>$1.salary</literal>
 314      to select one field of the argument row value.  Also notice
 315      how the calling <command>SELECT</> command uses <literal>*</>
 316      to select
 317      the entire current row of a table as a composite value.  The table
 318      row can alternatively be referenced using just the table name,
 319      like this:
 320 <screen>
 321 SELECT name, double_salary(emp) AS dream
 322     FROM emp
 323     WHERE emp.cubicle ~= point '(2,1)';
 324 </screen>
 325      but this usage is deprecated since it's easy to get confused.
 326     </para>
 327
 328     <para>
 329      Sometimes it is handy to construct a composite argument value
 330      on-the-fly.  This can be done with the <literal>ROW</> construct.
 331      For example, we could adjust the data being passed to the function:
 332 <screen>
 333 SELECT name, double_salary(ROW(name, salary*1.1, age, cubicle)) AS dream
 334     FROM emp;
 335 </screen>
 336     </para>
 337
 338     <para>
 339      It is also possible to build a function that returns a composite type.
 340      This is an example of a function
 341      that returns a single <type>emp</type> row:
 342
 343 <programlisting>
 344 CREATE FUNCTION new_emp() RETURNS emp AS $$
 345     SELECT text 'None' AS name,
 346         1000.0 AS salary,
 347         25 AS age,
 348         point '(2,2)' AS cubicle;
 349 $$ LANGUAGE SQL;
 350 </programlisting>
 351
 352      In this example we have specified each of  the  attributes
 353      with  a  constant value, but any computation
 354      could have been substituted for these constants.
 355     </para>
 356
 357     <para>
 358      Note two important things about defining the function:
 359
 360      <itemizedlist>
 361       <listitem>
 362        <para>
 363         The select list order in the query must be exactly the same as
 364         that in which the columns appear in the table associated
 365         with the composite type.  (Naming the columns, as we did above,
 366         is irrelevant to the system.)
 367        </para>
 368       </listitem>
 369       <listitem>
 370        <para>
 371         You must typecast the expressions to match the
 372         definition of the composite type, or you will get errors like this:
 373 <screen>
 374 <computeroutput>
 375 ERROR:  function declared to return emp returns varchar instead of text at column 1
 376 </computeroutput>
 377 </screen>
 378        </para>
 379       </listitem>
 380      </itemizedlist>
 381     </para>
 382
 383     <para>
 384      A different way to define the same function is:
 385
 386 <programlisting>
 387 CREATE FUNCTION new_emp() RETURNS emp AS $$
 388     SELECT ROW('None', 1000.0, 25, '(2,2)')::emp;
 389 $$ LANGUAGE SQL;
 390 </programlisting>
 391
 392      Here we wrote a <command>SELECT</> that returns just a single
 393      column of the correct composite type.  This isn't really better
 394      in this situation, but it is a handy alternative in some cases
 395      &mdash; for example, if we need to compute the result by calling
 396      another function that returns the desired composite value.
 397     </para>
 398
 399     <para>
 400      We could call this function directly in either of two ways:
 401
 402 <screen>
 403 SELECT new_emp();
 404
 405          new_emp
 406 --------------------------
 407  (None,1000.0,25,"(2,2)")
 408
 409 SELECT * FROM new_emp();
 410
 411  name | salary | age | cubicle
 412 ------+--------+-----+---------
 413  None | 1000.0 |  25 | (2,2)
 414 </screen>
 415
 416      The second way is described more fully in <xref
 417      linkend="xfunc-sql-table-functions">.
 418     </para>
 419
 420     <para>
 421      When you use a function that returns a composite type,
 422      you might want only one field (attribute) from its result.
 423      You can do that with syntax like this:
 424
 425 <screen>
 426 SELECT (new_emp()).name;
 427
 428  name
 429 ------
 430  None
 431 </screen>
 432
 433      The extra parentheses are needed to keep the parser from getting
 434      confused.  If you try to do it without them, you get something like this:
 435
 436 <screen>
 437 SELECT new_emp().name;
 438 ERROR:  syntax error at or near "."
 439 LINE 1: SELECT new_emp().name;
 440                         ^
 441 </screen>
 442     </para>
 443
 444     <para>
 445      Another option is to use
 446      functional notation for extracting an attribute.  The  simple  way
 447      to explain this is that we can use the
 448      notations <literal>attribute(table)</>  and  <literal>table.attribute</>
 449      interchangeably.
 450
 451 <screen>
 452 SELECT name(new_emp());
 453
 454  name
 455 ------
 456  None
 457 </screen>
 458
 459 <screen>
 460 -- This is the same as:
 461 -- SELECT emp.name AS youngster FROM emp WHERE emp.age &lt; 30;
 462
 463 SELECT name(emp) AS youngster FROM emp WHERE age(emp) &lt; 30;
 464
 465  youngster
 466 -----------
 467  Sam
 468  Andy
 469 </screen>
 470     </para>
 471
 472     <tip>
 473      <para>
 474       The equivalence between functional notation and attribute notation
 475       makes it possible to use functions on composite types to emulate
 476       <quote>computed fields</>.
 477       <indexterm>
 478        <primary>computed field</primary>
 479       </indexterm>
 480       <indexterm>
 481        <primary>field</primary>
 482        <secondary>computed</secondary>
 483       </indexterm>
 484       For example, using the previous definition
 485       for <literal>double_salary(emp)</>, we can write
 486
 487 <screen>
 488 SELECT emp.name, emp.double_salary FROM emp;
 489 </screen>
 490
 491       An application using this wouldn't need to be directly aware that
 492       <literal>double_salary</> isn't a real column of the table.
 493       (You can also emulate computed fields with views.)
 494      </para>
 495     </tip>
 496
 497     <para>
 498      Another way to use a function returning a composite type is to pass the
 499      result to another function that accepts the correct row type as input:
 500
 501 <screen>
 502 CREATE FUNCTION getname(emp) RETURNS text AS $$
 503     SELECT $1.name;
 504 $$ LANGUAGE SQL;
 505
 506 SELECT getname(new_emp());
 507  getname
 508 ---------
 509  None
 510 (1 row)
 511 </screen>
 512     </para>
 513
 514     <para>
 515      Still another way to use a function that returns a composite type is to
 516      call it as a table function, as described in <xref
 517      linkend="xfunc-sql-table-functions">.
 518     </para>
 519    </sect2>
 520
 521    <sect2 id="xfunc-named-parameters">
 522     <title><acronym>SQL</> Functions with Parameter Names</title>
 523
 524    <indexterm>
 525     <primary>function</primary>
 526     <secondary>named parameter</secondary>
 527    </indexterm>
 528
 529     <para>
 530      It is possible to attach names to a function's parameters, for example
 531
 532 <programlisting>
 533 CREATE FUNCTION tf1 (acct_no integer, debit numeric) RETURNS numeric AS $$
 534     UPDATE bank
 535         SET balance = balance - $2
 536         WHERE accountno = $1
 537     RETURNING balance;
 538 $$ LANGUAGE SQL;
 539 </programlisting>
 540
 541      Here the first parameter has been given the name <literal>acct_no</>,
 542      and the second parameter the name <literal>debit</>.
 543      So far as the SQL function itself is concerned, these names are just
 544      decoration; you must still refer to the parameters as <literal>$1</>,
 545      <literal>$2</>, etc within the function body.  (Some procedural
 546      languages let you use the parameter names instead.)  However,
 547      attaching names to the parameters is useful for documentation purposes.
 548      When a function has many parameters, it is also useful to use the names
 549      while calling the function, as described in
 550      <xref linkend="sql-syntax-calling-funcs">.
 551     </para>
 552    </sect2>
 553
 554    <sect2 id="xfunc-output-parameters">
 555     <title><acronym>SQL</> Functions with Output Parameters</title>
 556
 557    <indexterm>
 558     <primary>function</primary>
 559     <secondary>output parameter</secondary>
 560    </indexterm>
 561
 562     <para>
 563      An alternative way of describing a function's results is to define it
 564      with <firstterm>output parameters</>, as in this example:
 565
 566 <screen>
 567 CREATE FUNCTION add_em (IN x int, IN y int, OUT sum int)
 568 AS 'SELECT $1 + $2'
 569 LANGUAGE SQL;
 570
 571 SELECT add_em(3,7);
 572  add_em
 573 --------
 574      10
 575 (1 row)
 576 </screen>
 577
 578      This is not essentially different from the version of <literal>add_em</>
 579      shown in <xref linkend="xfunc-sql-base-functions">.  The real value of
 580      output parameters is that they provide a convenient way of defining
 581      functions that return several columns.  For example,
 582
 583 <screen>
 584 CREATE FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int)
 585 AS 'SELECT $1 + $2, $1 * $2'
 586 LANGUAGE SQL;
 587
 588  SELECT * FROM sum_n_product(11,42);
 589  sum | product
 590 -----+---------
 591   53 |     462
 592 (1 row)
 593 </screen>
 594
 595      What has essentially happened here is that we have created an anonymous
 596      composite type for the result of the function.  The above example has
 597      the same end result as
 598
 599 <screen>
 600 CREATE TYPE sum_prod AS (sum int, product int);
 601
 602 CREATE FUNCTION sum_n_product (int, int) RETURNS sum_prod
 603 AS 'SELECT $1 + $2, $1 * $2'
 604 LANGUAGE SQL;
 605 </screen>
 606
 607      but not having to bother with the separate composite type definition
 608      is often handy.  Notice that the names attached to the output parameters
 609      are not just decoration, but determine the column names of the anonymous
 610      composite type.  (If you omit a name for an output parameter, the
 611      system will choose a name on its own.)
 612     </para>
 613
 614     <para>
 615      Notice that output parameters are not included in the calling argument
 616      list when invoking such a function from SQL.  This is because
 617      <productname>PostgreSQL</productname> considers only the input
 618      parameters to define the function's calling signature.  That means
 619      also that only the input parameters matter when referencing the function
 620      for purposes such as dropping it.  We could drop the above function
 621      with either of
 622
 623 <screen>
 624 DROP FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int);
 625 DROP FUNCTION sum_n_product (int, int);
 626 </screen>
 627     </para>
 628
 629     <para>
 630      Parameters can be marked as <literal>IN</> (the default),
 631      <literal>OUT</>, <literal>INOUT</>, or <literal>VARIADIC</>.
 632      An <literal>INOUT</>
 633      parameter serves as both an input parameter (part of the calling
 634      argument list) and an output parameter (part of the result record type).
 635      <literal>VARIADIC</> parameters are input parameters, but are treated
 636      specially as described next.
 637     </para>
 638    </sect2>
 639
 640    <sect2 id="xfunc-sql-variadic-functions">
 641     <title><acronym>SQL</> Functions with Variable Numbers of Arguments</title>
 642
 643     <indexterm>
 644      <primary>function</primary>
 645      <secondary>variadic</secondary>
 646     </indexterm>
 647
 648     <indexterm>
 649      <primary>variadic function</primary>
 650     </indexterm>
 651
 652     <para>
 653      <acronym>SQL</acronym> functions can be declared to accept
 654      variable numbers of arguments, so long as all the <quote>optional</>
 655      arguments are of the same data type.  The optional arguments will be
 656      passed to the function as an array.  The function is declared by
 657      marking the last parameter as <literal>VARIADIC</>; this parameter
 658      must be declared as being of an array type.  For example:
 659
 660 <screen>
 661 CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS numeric AS $$
 662     SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
 663 $$ LANGUAGE SQL;
 664
 665 SELECT mleast(10, -1, 5, 4.4);
 666  mleast
 667 --------
 668      -1
 669 (1 row)
 670 </screen>
 671
 672      Effectively, all the actual arguments at or beyond the
 673      <literal>VARIADIC</> position are gathered up into a one-dimensional
 674      array, as if you had written
 675
 676 <screen>
 677 SELECT mleast(ARRAY[10, -1, 5, 4.4]);    -- doesn't work
 678 </screen>
 679
 680      You can't actually write that, though &mdash; or at least, it will
 681      not match this function definition.  A parameter marked
 682      <literal>VARIADIC</> matches one or more occurrences of its element
 683      type, not of its own type.
 684     </para>
 685
 686     <para>
 687      Sometimes it is useful to be able to pass an already-constructed array
 688      to a variadic function; this is particularly handy when one variadic
 689      function wants to pass on its array parameter to another one.  You can
 690      do that by specifying <literal>VARIADIC</> in the call:
 691
 692 <screen>
 693 SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);
 694 </screen>
 695
 696      This prevents expansion of the function's variadic parameter into its
 697      element type, thereby allowing the array argument value to match
 698      normally.  <literal>VARIADIC</> can only be attached to the last
 699      actual argument of a function call.
 700     </para>
 701
 702     <para>
 703      The array element parameters generated from a variadic parameter are
 704      treated as not having any names of their own.  This means it is not
 705      possible to call a variadic function using named arguments (<xref
 706      linkend="sql-syntax-calling-funcs">), except when you specify
 707      <literal>VARIADIC</>.  For example, this will work:
 708
 709 <screen>
 710 SELECT mleast(VARIADIC arr := ARRAY[10, -1, 5, 4.4]);
 711 </screen>
 712
 713      but not these:
 714
 715 <screen>
 716 SELECT mleast(arr := 10);
 717 SELECT mleast(arr := ARRAY[10, -1, 5, 4.4]);
 718 </screen>
 719     </para>
 720    </sect2>
 721
 722    <sect2 id="xfunc-sql-parameter-defaults">
 723     <title><acronym>SQL</> Functions with Default Values for Arguments</title>
 724
 725     <indexterm>
 726      <primary>function</primary>
 727      <secondary>default values for arguments</secondary>
 728     </indexterm>
 729
 730     <para>
 731      Functions can be declared with default values for some or all input
 732      arguments.  The default values are inserted whenever the function is
 733      called with insufficiently many actual arguments.  Since arguments
 734      can only be omitted from the end of the actual argument list, all
 735      parameters after a parameter with a default value have to have
 736      default values as well.  (Although the use of named argument notation
 737      could allow this restriction to be relaxed, it's still enforced so that
 738      positional argument notation works sensibly.)
 739     </para>
 740
 741     <para>
 742      For example:
 743 <screen>
 744 CREATE FUNCTION foo(a int, b int DEFAULT 2, c int DEFAULT 3)
 745 RETURNS int
 746 LANGUAGE SQL
 747 AS $$
 748     SELECT $1 + $2 + $3;
 749 $$;
 750
 751 SELECT foo(10, 20, 30);
 752  foo
 753 -----
 754   60
 755 (1 row)
 756
 757 SELECT foo(10, 20);
 758  foo
 759 -----
 760   33
 761 (1 row)
 762
 763 SELECT foo(10);
 764  foo
 765 -----
 766   15
 767 (1 row)
 768
 769 SELECT foo();  -- fails since there is no default for the first argument
 770 ERROR:  function foo() does not exist
 771 </screen>
 772      The <literal>=</literal> sign can also be used in place of the
 773      key word <literal>DEFAULT</literal>.
 774     </para>
 775    </sect2>
 776
 777    <sect2 id="xfunc-sql-table-functions">
 778     <title><acronym>SQL</acronym> Functions as Table Sources</title>
 779
 780     <para>
 781      All SQL functions can be used in the <literal>FROM</> clause of a query,
 782      but it is particularly useful for functions returning composite types.
 783      If the function is defined to return a base type, the table function
 784      produces a one-column table.  If the function is defined to return
 785      a composite type, the table function produces a column for each attribute
 786      of the composite type.
 787     </para>
 788
 789     <para>
 790      Here is an example:
 791
 792 <screen>
 793 CREATE TABLE foo (fooid int, foosubid int, fooname text);
 794 INSERT INTO foo VALUES (1, 1, 'Joe');
 795 INSERT INTO foo VALUES (1, 2, 'Ed');
 796 INSERT INTO foo VALUES (2, 1, 'Mary');
 797
 798 CREATE FUNCTION getfoo(int) RETURNS foo AS $$
 799     SELECT * FROM foo WHERE fooid = $1;
 800 $$ LANGUAGE SQL;
 801
 802 SELECT *, upper(fooname) FROM getfoo(1) AS t1;
 803
 804  fooid | foosubid | fooname | upper
 805 -------+----------+---------+-------
 806      1 |        1 | Joe     | JOE
 807 (1 row)
 808 </screen>
 809
 810      As the example shows, we can work with the columns of the function's
 811      result just the same as if they were columns of a regular table.
 812     </para>
 813
 814     <para>
 815      Note that we only got one row out of the function.  This is because
 816      we did not use <literal>SETOF</>.  That is described in the next section.
 817     </para>
 818    </sect2>
 819
 820    <sect2 id="xfunc-sql-functions-returning-set">
 821     <title><acronym>SQL</acronym> Functions Returning Sets</title>
 822
 823     <indexterm>
 824      <primary>function</primary>
 825      <secondary>with SETOF</secondary>
 826     </indexterm>
 827
 828     <para>
 829      When an SQL function is declared as returning <literal>SETOF
 830      <replaceable>sometype</></literal>, the function's final
 831      query is executed to completion, and each row it
 832      outputs is returned as an element of the result set.
 833     </para>
 834
 835     <para>
 836      This feature is normally used when calling the function in the <literal>FROM</>
 837      clause.  In this case each row returned by the function becomes
 838      a row of the table seen by the query.  For example, assume that
 839      table <literal>foo</> has the same contents as above, and we say:
 840
 841 <programlisting>
 842 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
 843     SELECT * FROM foo WHERE fooid = $1;
 844 $$ LANGUAGE SQL;
 845
 846 SELECT * FROM getfoo(1) AS t1;
 847 </programlisting>
 848
 849      Then we would get:
 850 <screen>
 851  fooid | foosubid | fooname
 852 -------+----------+---------
 853      1 |        1 | Joe
 854      1 |        2 | Ed
 855 (2 rows)
 856 </screen>
 857     </para>
 858
 859     <para>
 860      It is also possible to return multiple rows with the columns defined by
 861      output parameters, like this:
 862
 863 <programlisting>
 864 CREATE TABLE tab (y int, z int);
 865 INSERT INTO tab VALUES (1, 2), (3, 4), (5, 6), (7, 8);
 866
 867 CREATE FUNCTION sum_n_product_with_tab (x int, OUT sum int, OUT product int)
 868 RETURNS SETOF record
 869 AS $$
 870     SELECT $1 + tab.y, $1 * tab.y FROM tab;
 871 $$ LANGUAGE SQL;
 872
 873 SELECT * FROM sum_n_product_with_tab(10);
 874  sum | product
 875 -----+---------
 876   11 |      10
 877   13 |      30
 878   15 |      50
 879   17 |      70
 880 (4 rows)
 881 </programlisting>
 882
 883      The key point here is that you must write <literal>RETURNS SETOF record</>
 884      to indicate that the function returns multiple rows instead of just one.
 885      If there is only one output parameter, write that parameter's type
 886      instead of <type>record</>.
 887     </para>
 888
 889     <para>
 890      Currently, functions returning sets can also be called in the select list
 891      of a query.  For each row that the query
 892      generates by itself, the function returning set is invoked, and an output
 893      row is generated for each element of the function's result set. Note,
 894      however, that this capability is deprecated and might be removed in future
 895      releases. The following is an example function returning a set from the
 896      select list:
 897
 898 <screen>
 899 CREATE FUNCTION listchildren(text) RETURNS SETOF text AS $$
 900     SELECT name FROM nodes WHERE parent = $1
 901 $$ LANGUAGE SQL;
 902
 903 SELECT * FROM nodes;
 904    name    | parent
 905 -----------+--------
 906  Top       |
 907  Child1    | Top
 908  Child2    | Top
 909  Child3    | Top
 910  SubChild1 | Child1
 911  SubChild2 | Child1
 912 (6 rows)
 913
 914 SELECT listchildren('Top');
 915  listchildren
 916 --------------
 917  Child1
 918  Child2
 919  Child3
 920 (3 rows)
 921
 922 SELECT name, listchildren(name) FROM nodes;
 923   name  | listchildren
 924 --------+--------------
 925  Top    | Child1
 926  Top    | Child2
 927  Top    | Child3
 928  Child1 | SubChild1
 929  Child1 | SubChild2
 930 (5 rows)
 931 </screen>
 932
 933      In the last <command>SELECT</command>,
 934      notice that no output row appears for <literal>Child2</>, <literal>Child3</>, etc.
 935      This happens because <function>listchildren</function> returns an empty set
 936      for those arguments, so no result rows are generated.
 937     </para>
 938
 939     <note>
 940      <para>
 941       If a function's last command is <command>INSERT</>, <command>UPDATE</>,
 942       or <command>DELETE</> with <literal>RETURNING</>, that command will
 943       always be executed to completion, even if the function is not declared
 944       with <literal>SETOF</> or the calling query does not fetch all the
 945       result rows.  Any extra rows produced by the <literal>RETURNING</>
 946       clause are silently dropped, but the commanded table modifications
 947       still happen (and are all completed before returning from the function).
 948      </para>
 949     </note>
 950    </sect2>
 951
 952    <sect2 id="xfunc-sql-functions-returning-table">
 953     <title><acronym>SQL</acronym> Functions Returning <literal>TABLE</></title>
 954
 955     <indexterm>
 956      <primary>function</primary>
 957      <secondary>RETURNS TABLE</secondary>
 958     </indexterm>
 959
 960     <para>
 961      There is another way to declare a function as returning a set,
 962      which is to use the syntax
 963      <literal>RETURNS TABLE(<replaceable>columns</>)</literal>.
 964      This is equivalent to using one or more <literal>OUT</> parameters plus
 965      marking the function as returning <literal>SETOF record</> (or
 966      <literal>SETOF</> a single output parameter's type, as appropriate).
 967      This notation is specified in recent versions of the SQL standard, and
 968      thus may be more portable than using <literal>SETOF</>.
 969     </para>
 970
 971     <para>
 972      For example, the preceding sum-and-product example could also be
 973      done this way:
 974
 975 <programlisting>
 976 CREATE FUNCTION sum_n_product_with_tab (x int)
 977 RETURNS TABLE(sum int, product int) AS $$
 978     SELECT $1 + tab.y, $1 * tab.y FROM tab;
 979 $$ LANGUAGE SQL;
 980 </programlisting>
 981
 982      It is not allowed to use explicit <literal>OUT</> or <literal>INOUT</>
 983      parameters with the <literal>RETURNS TABLE</> notation &mdash; you must
 984      put all the output columns in the <literal>TABLE</> list.
 985     </para>
 986    </sect2>
 987
 988    <sect2>
 989     <title>Polymorphic <acronym>SQL</acronym> Functions</title>
 990
 991     <para>
 992      <acronym>SQL</acronym> functions can be declared to accept and
 993      return the polymorphic types <type>anyelement</type>,
 994      <type>anyarray</type>, <type>anynonarray</type>, and
 995      <type>anyenum</type>.  See <xref
 996      linkend="extend-types-polymorphic"> for a more detailed
 997      explanation of polymorphic functions. Here is a polymorphic
 998      function <function>make_array</function> that builds up an array
 999      from two arbitrary data type elements:
1000 <screen>
1001 CREATE FUNCTION make_array(anyelement, anyelement) RETURNS anyarray AS $$
1002     SELECT ARRAY[$1, $2];
1003 $$ LANGUAGE SQL;
1004
1005 SELECT make_array(1, 2) AS intarray, make_array('a'::text, 'b') AS textarray;
1006  intarray | textarray
1007 ----------+-----------
1008  {1,2}    | {a,b}
1009 (1 row)
1010 </screen>
1011     </para>
1012
1013     <para>
1014      Notice the use of the typecast <literal>'a'::text</literal>
1015      to specify that the argument is of type <type>text</type>. This is
1016      required if the argument is just a string literal, since otherwise
1017      it would be treated as type
1018      <type>unknown</type>, and array of <type>unknown</type> is not a valid
1019      type.
1020      Without the typecast, you will get errors like this:
1021 <screen>
1022 <computeroutput>
1023 ERROR:  could not determine polymorphic type because input has type "unknown"
1024 </computeroutput>
1025 </screen>
1026     </para>
1027
1028     <para>
1029      It is permitted to have polymorphic arguments with a fixed
1030      return type, but the converse is not. For example:
1031 <screen>
1032 CREATE FUNCTION is_greater(anyelement, anyelement) RETURNS boolean AS $$
1033     SELECT $1 &gt; $2;
1034 $$ LANGUAGE SQL;
1035
1036 SELECT is_greater(1, 2);
1037  is_greater
1038 ------------
1039  f
1040 (1 row)
1041
1042 CREATE FUNCTION invalid_func() RETURNS anyelement AS $$
1043     SELECT 1;
1044 $$ LANGUAGE SQL;
1045 ERROR:  cannot determine result data type
1046 DETAIL:  A function returning a polymorphic type must have at least one polymorphic argument.
1047 </screen>
1048     </para>
1049
1050     <para>
1051      Polymorphism can be used with functions that have output arguments.
1052      For example:
1053 <screen>
1054 CREATE FUNCTION dup (f1 anyelement, OUT f2 anyelement, OUT f3 anyarray)
1055 AS 'select $1, array[$1,$1]' LANGUAGE SQL;
1056
1057 SELECT * FROM dup(22);
1058  f2 |   f3
1059 ----+---------
1060  22 | {22,22}
1061 (1 row)
1062 </screen>
1063     </para>
1064
1065     <para>
1066      Polymorphism can also be used with variadic functions.
1067      For example:
1068 <screen>
1069 CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$
1070     SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
1071 $$ LANGUAGE SQL;
1072
1073 SELECT anyleast(10, -1, 5, 4);
1074  anyleast
1075 ----------
1076        -1
1077 (1 row)
1078
1079 SELECT anyleast('abc'::text, 'def');
1080  anyleast
1081 ----------
1082  abc
1083 (1 row)
1084
1085 CREATE FUNCTION concat(text, VARIADIC anyarray) RETURNS text AS $$
1086     SELECT array_to_string($2, $1);
1087 $$ LANGUAGE SQL;
1088
1089 SELECT concat('|', 1, 4, 2);
1090  concat
1091 --------
1092  1|4|2
1093 (1 row)
1094 </screen>
1095     </para>
1096    </sect2>
1097   </sect1>
1098
1099   <sect1 id="xfunc-overload">
1100    <title>Function Overloading</title>
1101
1102    <indexterm zone="xfunc-overload">
1103     <primary>overloading</primary>
1104     <secondary>functions</secondary>
1105    </indexterm>
1106
1107    <para>
1108     More than one function can be defined with the same SQL name, so long
1109     as the arguments they take are different.  In other words,
1110     function names can be <firstterm>overloaded</firstterm>.  When a
1111     query is executed, the server will determine which function to
1112     call from the data types and the number of the provided arguments.
1113     Overloading can also be used to simulate functions with a variable
1114     number of arguments, up to a finite maximum number.
1115    </para>
1116
1117    <para>
1118     When creating a family of overloaded functions, one should be
1119     careful not to create ambiguities.  For instance, given the
1120     functions:
1121 <programlisting>
1122 CREATE FUNCTION test(int, real) RETURNS ...
1123 CREATE FUNCTION test(smallint, double precision) RETURNS ...
1124 </programlisting>
1125     it is not immediately clear which function would be called with
1126     some trivial input like <literal>test(1, 1.5)</literal>.  The
1127     currently implemented resolution rules are described in
1128     <xref linkend="typeconv">, but it is unwise to design a system that subtly
1129     relies on this behavior.
1130    </para>
1131
1132    <para>
1133     A function that takes a single argument of a composite type should
1134     generally not have the same name as any attribute (field) of that type.
1135     Recall that <literal>attribute(table)</literal> is considered equivalent
1136     to <literal>table.attribute</literal>.  In the case that there is an
1137     ambiguity between a function on a composite type and an attribute of
1138     the composite type, the attribute will always be used.  It is possible
1139     to override that choice by schema-qualifying the function name
1140     (that is, <literal>schema.func(table)</literal>) but it's better to
1141     avoid the problem by not choosing conflicting names.
1142    </para>
1143
1144    <para>
1145     Another possible conflict is between variadic and non-variadic functions.
1146     For instance, it is possible to create both <literal>foo(numeric)</> and
1147     <literal>foo(VARIADIC numeric[])</>.  In this case it is unclear which one
1148     should be matched to a call providing a single numeric argument, such as
1149     <literal>foo(10.1)</>.  The rule is that the function appearing
1150     earlier in the search path is used, or if the two functions are in the
1151     same schema, the non-variadic one is preferred.
1152    </para>
1153
1154    <para>
1155     When overloading C-language functions, there is an additional
1156     constraint: The C name of each function in the family of
1157     overloaded functions must be different from the C names of all
1158     other functions, either internal or dynamically loaded.  If this
1159     rule is violated, the behavior is not portable.  You might get a
1160     run-time linker error, or one of the functions will get called
1161     (usually the internal one).  The alternative form of the
1162     <literal>AS</> clause for the SQL <command>CREATE
1163     FUNCTION</command> command decouples the SQL function name from
1164     the function name in the C source code.  For instance:
1165 <programlisting>
1166 CREATE FUNCTION test(int) RETURNS int
1167     AS '<replaceable>filename</>', 'test_1arg'
1168     LANGUAGE C;
1169 CREATE FUNCTION test(int, int) RETURNS int
1170     AS '<replaceable>filename</>', 'test_2arg'
1171     LANGUAGE C;
1172 </programlisting>
1173     The names of the C functions here reflect one of many possible conventions.
1174    </para>
1175   </sect1>
1176
1177   <sect1 id="xfunc-volatility">
1178    <title>Function Volatility Categories</title>
1179
1180    <indexterm zone="xfunc-volatility">
1181     <primary>volatility</primary>
1182     <secondary>functions</secondary>
1183    </indexterm>
1184    <indexterm zone="xfunc-volatility">
1185     <primary>VOLATILE</primary>
1186    </indexterm>
1187    <indexterm zone="xfunc-volatility">
1188     <primary>STABLE</primary>
1189    </indexterm>
1190    <indexterm zone="xfunc-volatility">
1191     <primary>IMMUTABLE</primary>
1192    </indexterm>
1193
1194    <para>
1195     Every function has a <firstterm>volatility</> classification, with
1196     the possibilities being <literal>VOLATILE</>, <literal>STABLE</>, or
1197     <literal>IMMUTABLE</>.  <literal>VOLATILE</> is the default if the
1198     <xref linkend="sql-createfunction">
1199     command does not specify a category.  The volatility category is a
1200     promise to the optimizer about the behavior of the function:
1201
1202    <itemizedlist>
1203     <listitem>
1204      <para>
1205       A <literal>VOLATILE</> function can do anything, including modifying
1206       the database.  It can return different results on successive calls with
1207       the same arguments.  The optimizer makes no assumptions about the
1208       behavior of such functions.  A query using a volatile function will
1209       re-evaluate the function at every row where its value is needed.
1210      </para>
1211     </listitem>
1212     <listitem>
1213      <para>
1214       A <literal>STABLE</> function cannot modify the database and is
1215       guaranteed to return the same results given the same arguments
1216       for all rows within a single statement. This category allows the
1217       optimizer to optimize multiple calls of the function to a single
1218       call. In particular, it is safe to use an expression containing
1219       such a function in an index scan condition. (Since an index scan
1220       will evaluate the comparison value only once, not once at each
1221       row, it is not valid to use a <literal>VOLATILE</> function in an
1222       index scan condition.)
1223      </para>
1224     </listitem>
1225     <listitem>
1226      <para>
1227       An <literal>IMMUTABLE</> function cannot modify the database and is
1228       guaranteed to return the same results given the same arguments forever.
1229       This category allows the optimizer to pre-evaluate the function when
1230       a query calls it with constant arguments.  For example, a query like
1231       <literal>SELECT ... WHERE x = 2 + 2</> can be simplified on sight to
1232       <literal>SELECT ... WHERE x = 4</>, because the function underlying
1233       the integer addition operator is marked <literal>IMMUTABLE</>.
1234      </para>
1235     </listitem>
1236    </itemizedlist>
1237    </para>
1238
1239    <para>
1240     For best optimization results, you should label your functions with the
1241     strictest volatility category that is valid for them.
1242    </para>
1243
1244    <para>
1245     Any function with side-effects <emphasis>must</> be labeled
1246     <literal>VOLATILE</>, so that calls to it cannot be optimized away.
1247     Even a function with no side-effects needs to be labeled
1248     <literal>VOLATILE</> if its value can change within a single query;
1249     some examples are <literal>random()</>, <literal>currval()</>,
1250     <literal>timeofday()</>.
1251    </para>
1252
1253    <para>
1254     Another important example is that the <function>current_timestamp</>
1255     family of functions qualify as <literal>STABLE</>, since their values do
1256     not change within a transaction.
1257    </para>
1258
1259    <para>
1260     There is relatively little difference between <literal>STABLE</> and
1261     <literal>IMMUTABLE</> categories when considering simple interactive
1262     queries that are planned and immediately executed: it doesn't matter
1263     a lot whether a function is executed once during planning or once during
1264     query execution startup.  But there is a big difference if the plan is
1265     saved and reused later.  Labeling a function <literal>IMMUTABLE</> when
1266     it really isn't might allow it to be prematurely folded to a constant during
1267     planning, resulting in a stale value being re-used during subsequent uses
1268     of the plan.  This is a hazard when using prepared statements or when
1269     using function languages that cache plans (such as
1270     <application>PL/pgSQL</>).
1271    </para>
1272
1273    <para>
1274     For functions written in SQL or in any of the standard procedural
1275     languages, there is a second important property determined by the
1276     volatility category, namely the visibility of any data changes that have
1277     been made by the SQL command that is calling the function.  A
1278     <literal>VOLATILE</> function will see such changes, a <literal>STABLE</>
1279     or <literal>IMMUTABLE</> function will not.  This behavior is implemented
1280     using the snapshotting behavior of MVCC (see <xref linkend="mvcc">):
1281     <literal>STABLE</> and <literal>IMMUTABLE</> functions use a snapshot
1282     established as of the start of the calling query, whereas
1283     <literal>VOLATILE</> functions obtain a fresh snapshot at the start of
1284     each query they execute.
1285    </para>
1286
1287    <note>
1288     <para>
1289      Functions written in C can manage snapshots however they want, but it's
1290      usually a good idea to make C functions work this way too.
1291     </para>
1292    </note>
1293
1294    <para>
1295     Because of this snapshotting behavior,
1296     a function containing only <command>SELECT</> commands can safely be
1297     marked <literal>STABLE</>, even if it selects from tables that might be
1298     undergoing modifications by concurrent queries.
1299     <productname>PostgreSQL</productname> will execute all commands of a
1300     <literal>STABLE</> function using the snapshot established for the
1301     calling query, and so it will see a fixed view of the database throughout
1302     that query.
1303    </para>
1304
1305    <para>
1306     The same snapshotting behavior is used for <command>SELECT</> commands
1307     within <literal>IMMUTABLE</> functions.  It is generally unwise to select
1308     from database tables within an <literal>IMMUTABLE</> function at all,
1309     since the immutability will be broken if the table contents ever change.
1310     However, <productname>PostgreSQL</productname> does not enforce that you
1311     do not do that.
1312    </para>
1313
1314    <para>
1315     A common error is to label a function <literal>IMMUTABLE</> when its
1316     results depend on a configuration parameter.  For example, a function
1317     that manipulates timestamps might well have results that depend on the
1318     <xref linkend="guc-timezone"> setting.  For safety, such functions should
1319     be labeled <literal>STABLE</> instead.
1320    </para>
1321
1322    <note>
1323     <para>
1324      Before <productname>PostgreSQL</productname> release 8.0, the requirement
1325      that <literal>STABLE</> and <literal>IMMUTABLE</> functions cannot modify
1326      the database was not enforced by the system.  Releases 8.0 and later enforce it
1327      by requiring SQL functions and procedural language functions of these
1328      categories to contain no SQL commands other than <command>SELECT</>.
1329      (This is not a completely bulletproof test, since such functions could
1330      still call <literal>VOLATILE</> functions that modify the database.
1331      If you do that, you will find that the <literal>STABLE</> or
1332      <literal>IMMUTABLE</> function does not notice the database changes
1333      applied by the called function, since they are hidden from its snapshot.)
1334     </para>
1335    </note>
1336   </sect1>
1337
1338   <sect1 id="xfunc-pl">
1339    <title>Procedural Language Functions</title>
1340
1341    <para>
1342     <productname>PostgreSQL</productname> allows user-defined functions
1343     to be written in other languages besides SQL and C.  These other
1344     languages are generically called <firstterm>procedural
1345     languages</firstterm> (<acronym>PL</>s).
1346     Procedural languages aren't built into the
1347     <productname>PostgreSQL</productname> server; they are offered
1348     by loadable modules.
1349     See <xref linkend="xplang"> and following chapters for more
1350     information.
1351    </para>
1352   </sect1>
1353
1354   <sect1 id="xfunc-internal">
1355    <title>Internal Functions</title>
1356
1357    <indexterm zone="xfunc-internal"><primary>function</><secondary>internal</></>
1358
1359    <para>
1360     Internal functions are functions written in C that have been statically
1361     linked into the <productname>PostgreSQL</productname> server.
1362     The <quote>body</quote> of the function definition
1363     specifies the C-language name of the function, which need not be the
1364     same as the name being declared for SQL use.
1365     (For reasons of backwards compatibility, an empty body
1366     is accepted as meaning that the C-language function name is the
1367     same as the SQL name.)
1368    </para>
1369
1370    <para>
1371     Normally, all internal functions present in the
1372     server are declared during the initialization of the database cluster
1373     (see <xref linkend="creating-cluster">),
1374     but a user could use <command>CREATE FUNCTION</command>
1375     to create additional alias names for an internal function.
1376     Internal functions are declared in <command>CREATE FUNCTION</command>
1377     with language name <literal>internal</literal>.  For instance, to
1378     create an alias for the <function>sqrt</function> function:
1379 <programlisting>
1380 CREATE FUNCTION square_root(double precision) RETURNS double precision
1381     AS 'dsqrt'
1382     LANGUAGE internal
1383     STRICT;
1384 </programlisting>
1385     (Most internal functions expect to be declared <quote>strict</quote>.)
1386    </para>
1387
1388    <note>
1389     <para>
1390      Not all <quote>predefined</quote> functions are
1391      <quote>internal</quote> in the above sense.  Some predefined
1392      functions are written in SQL.
1393     </para>
1394    </note>
1395   </sect1>
1396
1397   <sect1 id="xfunc-c">
1398    <title>C-Language Functions</title>
1399
1400    <indexterm zone="xfunc-c">
1401     <primary>function</primary>
1402     <secondary>user-defined</secondary>
1403     <tertiary>in C</tertiary>
1404    </indexterm>
1405
1406    <para>
1407     User-defined functions can be written in C (or a language that can
1408     be made compatible with C, such as C++).  Such functions are
1409     compiled into dynamically loadable objects (also called shared
1410     libraries) and are loaded by the server on demand.  The dynamic
1411     loading feature is what distinguishes <quote>C language</> functions
1412     from <quote>internal</> functions &mdash; the actual coding conventions
1413     are essentially the same for both.  (Hence, the standard internal
1414     function library is a rich source of coding examples for user-defined
1415     C functions.)
1416    </para>
1417
1418    <para>
1419     Two different calling conventions are currently used for C functions.
1420     The newer <quote>version 1</quote> calling convention is indicated by writing
1421     a <literal>PG_FUNCTION_INFO_V1()</literal> macro call for the function,
1422     as illustrated below.  Lack of such a macro indicates an old-style
1423     (<quote>version 0</quote>) function.  The language name specified in <command>CREATE FUNCTION</command>
1424     is <literal>C</literal> in either case.  Old-style functions are now deprecated
1425     because of portability problems and lack of functionality, but they
1426     are still supported for compatibility reasons.
1427    </para>
1428
1429   <sect2 id="xfunc-c-dynload">
1430    <title>Dynamic Loading</title>
1431
1432    <indexterm zone="xfunc-c-dynload">
1433     <primary>dynamic loading</primary>
1434    </indexterm>
1435
1436    <para>
1437     The first time a user-defined function in a particular
1438     loadable object file is called in a session,
1439     the dynamic loader loads that object file into memory so that the
1440     function can be called.  The <command>CREATE FUNCTION</command>
1441     for a user-defined C function must therefore specify two pieces of
1442     information for the function: the name of the loadable
1443     object file, and the C name (link symbol) of the specific function to call
1444     within that object file.  If the C name is not explicitly specified then
1445     it is assumed to be the same as the SQL function name.
1446    </para>
1447
1448    <para>
1449     The following algorithm is used to locate the shared object file
1450     based on the name given in the <command>CREATE FUNCTION</command>
1451     command:
1452
1453     <orderedlist>
1454      <listitem>
1455       <para>
1456        If the name is an absolute path, the given file is loaded.
1457       </para>
1458      </listitem>
1459
1460      <listitem>
1461       <para>
1462        If the name starts with the string <literal>$libdir</literal>,
1463        that part is replaced by the <productname>PostgreSQL</> package
1464         library directory
1465        name, which is determined at build time.<indexterm><primary>$libdir</></>
1466       </para>
1467      </listitem>
1468
1469      <listitem>
1470       <para>
1471        If the name does not contain a directory part, the file is
1472        searched for in the path specified by the configuration variable
1473        <xref linkend="guc-dynamic-library-path">.<indexterm><primary>dynamic_library_path</></>
1474       </para>
1475      </listitem>
1476
1477      <listitem>
1478       <para>
1479        Otherwise (the file was not found in the path, or it contains a
1480        non-absolute directory part), the dynamic loader will try to
1481        take the name as given, which will most likely fail.  (It is
1482        unreliable to depend on the current working directory.)
1483       </para>
1484      </listitem>
1485     </orderedlist>
1486
1487     If this sequence does not work, the platform-specific shared
1488     library file name extension (often <filename>.so</filename>) is
1489     appended to the given name and this sequence is tried again.  If
1490     that fails as well, the load will fail.
1491    </para>
1492
1493    <para>
1494     It is recommended to locate shared libraries either relative to
1495     <literal>$libdir</literal> or through the dynamic library path.
1496     This simplifies version upgrades if the new installation is at a
1497     different location.  The actual directory that
1498     <literal>$libdir</literal> stands for can be found out with the
1499     command <literal>pg_config --pkglibdir</literal>.
1500    </para>
1501
1502    <para>
1503     The user ID the <productname>PostgreSQL</productname> server runs
1504     as must be able to traverse the path to the file you intend to
1505     load.  Making the file or a higher-level directory not readable
1506     and/or not executable by the <systemitem>postgres</systemitem>
1507     user is a common mistake.
1508    </para>
1509
1510    <para>
1511     In any case, the file name that is given in the
1512     <command>CREATE FUNCTION</command> command is recorded literally
1513     in the system catalogs, so if the file needs to be loaded again
1514     the same procedure is applied.
1515    </para>
1516
1517    <note>
1518     <para>
1519      <productname>PostgreSQL</productname> will not compile a C function
1520      automatically.  The object file must be compiled before it is referenced
1521      in a <command>CREATE
1522      FUNCTION</> command.  See <xref linkend="dfunc"> for additional
1523      information.
1524     </para>
1525    </note>
1526
1527    <indexterm zone="xfunc-c-dynload">
1528     <primary>magic block</primary>
1529    </indexterm>
1530
1531    <para>
1532     To ensure that a dynamically loaded object file is not loaded into an
1533     incompatible server, <productname>PostgreSQL</productname> checks that the
1534     file contains a <quote>magic block</> with the appropriate contents.
1535     This allows the server to detect obvious incompatibilities, such as code
1536     compiled for a different major version of
1537     <productname>PostgreSQL</productname>.  A magic block is required as of
1538     <productname>PostgreSQL</productname> 8.2.  To include a magic block,
1539     write this in one (and only one) of the module source files, after having
1540     included the header <filename>fmgr.h</>:
1541
1542 <programlisting>
1543 #ifdef PG_MODULE_MAGIC
1544 PG_MODULE_MAGIC;
1545 #endif
1546 </programlisting>
1547
1548     The <literal>#ifdef</> test can be omitted if the code doesn't
1549     need to compile against pre-8.2 <productname>PostgreSQL</productname>
1550     releases.
1551    </para>
1552
1553    <para>
1554     After it is used for the first time, a dynamically loaded object
1555     file is retained in memory.  Future calls in the same session to
1556     the function(s) in that file will only incur the small overhead of
1557     a symbol table lookup.  If you need to force a reload of an object
1558     file, for example after recompiling it, begin a fresh session.
1559    </para>
1560
1561    <indexterm zone="xfunc-c-dynload">
1562     <primary>_PG_init</primary>
1563    </indexterm>
1564    <indexterm zone="xfunc-c-dynload">
1565     <primary>_PG_fini</primary>
1566    </indexterm>
1567    <indexterm zone="xfunc-c-dynload">
1568     <primary>library initialization function</primary>
1569    </indexterm>
1570    <indexterm zone="xfunc-c-dynload">
1571     <primary>library finalization function</primary>
1572    </indexterm>
1573
1574    <para>
1575     Optionally, a dynamically loaded file can contain initialization and
1576     finalization functions.  If the file includes a function named
1577     <function>_PG_init</>, that function will be called immediately after
1578     loading the file.  The function receives no parameters and should
1579     return void.  If the file includes a function named
1580     <function>_PG_fini</>, that function will be called immediately before
1581     unloading the file.  Likewise, the function receives no parameters and
1582     should return void.  Note that <function>_PG_fini</> will only be called
1583     during an unload of the file, not during process termination.
1584     (Presently, unloads are disabled and will never occur, but this may
1585     change in the future.)
1586    </para>
1587
1588   </sect2>
1589
1590    <sect2 id="xfunc-c-basetype">
1591     <title>Base Types in C-Language Functions</title>
1592
1593     <indexterm zone="xfunc-c-basetype">
1594      <primary>data type</primary>
1595      <secondary>internal organization</secondary>
1596     </indexterm>
1597
1598     <para>
1599      To know how to write C-language functions, you need to know how
1600      <productname>PostgreSQL</productname> internally represents base
1601      data types and how they can be passed to and from functions.
1602      Internally, <productname>PostgreSQL</productname> regards a base
1603      type as a <quote>blob of memory</quote>.  The user-defined
1604      functions that you define over a type in turn define the way that
1605      <productname>PostgreSQL</productname> can operate on it.  That
1606      is, <productname>PostgreSQL</productname> will only store and
1607      retrieve the data from disk and use your user-defined functions
1608      to input, process, and output the data.
1609     </para>
1610
1611     <para>
1612      Base types can have one of three internal formats:
1613
1614      <itemizedlist>
1615       <listitem>
1616        <para>
1617         pass by value, fixed-length
1618        </para>
1619       </listitem>
1620       <listitem>
1621        <para>
1622         pass by reference, fixed-length
1623        </para>
1624       </listitem>
1625       <listitem>
1626        <para>
1627         pass by reference, variable-length
1628        </para>
1629       </listitem>
1630      </itemizedlist>
1631     </para>
1632
1633     <para>
1634      By-value  types  can  only be 1, 2, or 4 bytes in length
1635      (also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine).
1636      You should be careful to define your types such that they will be the
1637      same size (in bytes) on all architectures.  For example, the
1638      <literal>long</literal> type is dangerous because it is 4 bytes on some
1639      machines and 8 bytes on others, whereas <type>int</type> type is 4 bytes
1640      on most Unix machines.  A reasonable implementation of the
1641      <type>int4</type> type on Unix machines might be:
1642
1643 <programlisting>
1644 /* 4-byte integer, passed by value */
1645 typedef int int4;
1646 </programlisting>
1647     </para>
1648
1649     <para>
1650      On  the  other hand, fixed-length types of any size can
1651      be passed by-reference.  For example, here is a  sample
1652      implementation of a <productname>PostgreSQL</productname> type:
1653
1654 <programlisting>
1655 /* 16-byte structure, passed by reference */
1656 typedef struct
1657 {
1658     double  x, y;
1659 } Point;
1660 </programlisting>
1661
1662      Only  pointers  to  such types can be used when passing
1663      them in and out of <productname>PostgreSQL</productname> functions.
1664      To return a value of such a type, allocate the right amount of
1665      memory with <literal>palloc</literal>, fill in the allocated memory,
1666      and return a pointer to it.  (Also, if you just want to return the
1667      same value as one of your input arguments that's of the same data type,
1668      you can skip the extra <literal>palloc</literal> and just return the
1669      pointer to the input value.)
1670     </para>
1671
1672     <para>
1673      Finally, all variable-length types must also be  passed
1674      by  reference.   All  variable-length  types must begin
1675      with a length field of exactly 4 bytes, and all data to
1676      be  stored within that type must be located in the memory
1677      immediately  following  that  length  field.   The
1678      length field contains the total length of the structure,
1679      that is,  it  includes  the  size  of  the  length  field
1680      itself.
1681     </para>
1682
1683     <warning>
1684      <para>
1685       <emphasis>Never</> modify the contents of a pass-by-reference input
1686       value.  If you do so you are likely to corrupt on-disk data, since
1687       the pointer you are given might point directly into a disk buffer.
1688       The sole exception to this rule is explained in
1689       <xref linkend="xaggr">.
1690      </para>
1691     </warning>
1692
1693     <para>
1694      As an example, we can define the type <type>text</type> as
1695      follows:
1696
1697 <programlisting>
1698 typedef struct {
1699     int4 length;
1700     char data[1];
1701 } text;
1702 </programlisting>
1703
1704      Obviously,  the  data  field declared here is not long enough to hold
1705      all possible strings.  Since it's impossible to declare a variable-size
1706      structure in <acronym>C</acronym>, we rely on the knowledge that the
1707      <acronym>C</acronym> compiler won't range-check array subscripts.  We
1708      just allocate the necessary amount of space and then access the array as
1709      if it were declared the right length.  (This is a common trick, which
1710      you can read about in many textbooks about C.)
1711     </para>
1712
1713     <para>
1714      When manipulating
1715      variable-length types, we must  be  careful  to  allocate
1716      the  correct amount  of memory and set the length field correctly.
1717      For example, if we wanted to  store  40  bytes  in  a <structname>text</>
1718      structure, we might use a code fragment like this:
1719
1720 <programlisting><![CDATA[
1721 #include "postgres.h"
1722 ...
1723 char buffer[40]; /* our source data */
1724 ...
1725 text *destination = (text *) palloc(VARHDRSZ + 40);
1726 destination->length = VARHDRSZ + 40;
1727 memcpy(destination->data, buffer, 40);
1728 ...
1729 ]]>
1730 </programlisting>
1731
1732      <literal>VARHDRSZ</> is the same as <literal>sizeof(int4)</>, but
1733      it's considered good style to use the macro <literal>VARHDRSZ</>
1734      to refer to the size of the overhead for a variable-length type.
1735     </para>
1736
1737     <para>
1738      <xref linkend="xfunc-c-type-table"> specifies which C type
1739      corresponds to which SQL type when writing a C-language function
1740      that uses a built-in type of <productname>PostgreSQL</>.
1741      The <quote>Defined In</quote> column gives the header file that
1742      needs to be included to get the type definition.  (The actual
1743      definition might be in a different file that is included by the
1744      listed file.  It is recommended that users stick to the defined
1745      interface.)  Note that you should always include
1746      <filename>postgres.h</filename> first in any source file, because
1747      it declares a number of things that you will need anyway.
1748     </para>
1749
1750      <table tocentry="1" id="xfunc-c-type-table">
1751       <title>Equivalent C Types for Built-In SQL Types</title>
1752       <tgroup cols="3">
1753        <thead>
1754         <row>
1755          <entry>
1756           SQL Type
1757          </entry>
1758          <entry>
1759           C Type
1760          </entry>
1761          <entry>
1762           Defined In
1763          </entry>
1764         </row>
1765        </thead>
1766        <tbody>
1767         <row>
1768          <entry><type>abstime</type></entry>
1769          <entry><type>AbsoluteTime</type></entry>
1770          <entry><filename>utils/nabstime.h</filename></entry>
1771         </row>
1772         <row>
1773          <entry><type>boolean</type></entry>
1774          <entry><type>bool</type></entry>
1775          <entry><filename>postgres.h</filename> (maybe compiler built-in)</entry>
1776         </row>
1777         <row>
1778          <entry><type>box</type></entry>
1779          <entry><type>BOX*</type></entry>
1780          <entry><filename>utils/geo_decls.h</filename></entry>
1781         </row>
1782         <row>
1783          <entry><type>bytea</type></entry>
1784          <entry><type>bytea*</type></entry>
1785          <entry><filename>postgres.h</filename></entry>
1786         </row>
1787         <row>
1788          <entry><type>"char"</type></entry>
1789          <entry><type>char</type></entry>
1790          <entry>(compiler built-in)</entry>
1791         </row>
1792         <row>
1793          <entry><type>character</type></entry>
1794          <entry><type>BpChar*</type></entry>
1795          <entry><filename>postgres.h</filename></entry>
1796         </row>
1797         <row>
1798          <entry><type>cid</type></entry>
1799          <entry><type>CommandId</type></entry>
1800          <entry><filename>postgres.h</filename></entry>
1801         </row>
1802         <row>
1803          <entry><type>date</type></entry>
1804          <entry><type>DateADT</type></entry>
1805          <entry><filename>utils/date.h</filename></entry>
1806         </row>
1807         <row>
1808          <entry><type>smallint</type> (<type>int2</type>)</entry>
1809          <entry><type>int2</type> or <type>int16</type></entry>
1810          <entry><filename>postgres.h</filename></entry>
1811         </row>
1812         <row>
1813          <entry><type>int2vector</type></entry>
1814          <entry><type>int2vector*</type></entry>
1815          <entry><filename>postgres.h</filename></entry>
1816         </row>
1817         <row>
1818          <entry><type>integer</type> (<type>int4</type>)</entry>
1819          <entry><type>int4</type> or <type>int32</type></entry>
1820          <entry><filename>postgres.h</filename></entry>
1821         </row>
1822         <row>
1823          <entry><type>real</type> (<type>float4</type>)</entry>
1824          <entry><type>float4*</type></entry>
1825         <entry><filename>postgres.h</filename></entry>
1826         </row>
1827         <row>
1828          <entry><type>double precision</type> (<type>float8</type>)</entry>
1829          <entry><type>float8*</type></entry>
1830          <entry><filename>postgres.h</filename></entry>
1831         </row>
1832         <row>
1833          <entry><type>interval</type></entry>
1834          <entry><type>Interval*</type></entry>
1835          <entry><filename>utils/timestamp.h</filename></entry>
1836         </row>
1837         <row>
1838          <entry><type>lseg</type></entry>
1839          <entry><type>LSEG*</type></entry>
1840          <entry><filename>utils/geo_decls.h</filename></entry>
1841         </row>
1842         <row>
1843          <entry><type>name</type></entry>
1844          <entry><type>Name</type></entry>
1845          <entry><filename>postgres.h</filename></entry>
1846         </row>
1847         <row>
1848          <entry><type>oid</type></entry>
1849          <entry><type>Oid</type></entry>
1850          <entry><filename>postgres.h</filename></entry>
1851         </row>
1852         <row>
1853          <entry><type>oidvector</type></entry>
1854          <entry><type>oidvector*</type></entry>
1855          <entry><filename>postgres.h</filename></entry>
1856         </row>
1857         <row>
1858          <entry><type>path</type></entry>
1859          <entry><type>PATH*</type></entry>
1860          <entry><filename>utils/geo_decls.h</filename></entry>
1861         </row>
1862         <row>
1863          <entry><type>point</type></entry>
1864          <entry><type>POINT*</type></entry>
1865          <entry><filename>utils/geo_decls.h</filename></entry>
1866         </row>
1867         <row>
1868          <entry><type>regproc</type></entry>
1869          <entry><type>regproc</type></entry>
1870          <entry><filename>postgres.h</filename></entry>
1871         </row>
1872         <row>
1873          <entry><type>reltime</type></entry>
1874          <entry><type>RelativeTime</type></entry>
1875          <entry><filename>utils/nabstime.h</filename></entry>
1876         </row>
1877         <row>
1878          <entry><type>text</type></entry>
1879          <entry><type>text*</type></entry>
1880          <entry><filename>postgres.h</filename></entry>
1881         </row>
1882         <row>
1883          <entry><type>tid</type></entry>
1884          <entry><type>ItemPointer</type></entry>
1885          <entry><filename>storage/itemptr.h</filename></entry>
1886         </row>
1887         <row>
1888          <entry><type>time</type></entry>
1889          <entry><type>TimeADT</type></entry>
1890          <entry><filename>utils/date.h</filename></entry>
1891         </row>
1892         <row>
1893          <entry><type>time with time zone</type></entry>
1894          <entry><type>TimeTzADT</type></entry>
1895          <entry><filename>utils/date.h</filename></entry>
1896         </row>
1897         <row>
1898          <entry><type>timestamp</type></entry>
1899          <entry><type>Timestamp*</type></entry>
1900          <entry><filename>utils/timestamp.h</filename></entry>
1901         </row>
1902         <row>
1903          <entry><type>tinterval</type></entry>
1904          <entry><type>TimeInterval</type></entry>
1905          <entry><filename>utils/nabstime.h</filename></entry>
1906         </row>
1907         <row>
1908          <entry><type>varchar</type></entry>
1909          <entry><type>VarChar*</type></entry>
1910          <entry><filename>postgres.h</filename></entry>
1911         </row>
1912         <row>
1913          <entry><type>xid</type></entry>
1914          <entry><type>TransactionId</type></entry>
1915          <entry><filename>postgres.h</filename></entry>
1916         </row>
1917        </tbody>
1918       </tgroup>
1919      </table>
1920
1921     <para>
1922      Now that we've gone over all of the possible structures
1923      for base types, we can show some examples of real functions.
1924     </para>
1925    </sect2>
1926
1927    <sect2>
1928     <title>Version 0 Calling Conventions</title>
1929
1930     <para>
1931      We present the <quote>old style</quote> calling convention first &mdash; although
1932      this approach is now deprecated, it's easier to get a handle on
1933      initially.  In the version-0 method, the arguments and result
1934      of the C function are just declared in normal C style, but being
1935      careful to use the C representation of each SQL data type as shown
1936      above.
1937     </para>
1938
1939     <para>
1940      Here are some examples:
1941
1942 <programlisting><![CDATA[
1943 #include "postgres.h"
1944 #include <string.h>
1945 #include "utils/geo_decls.h"
1946
1947 #ifdef PG_MODULE_MAGIC
1948 PG_MODULE_MAGIC;
1949 #endif
1950
1951 /* by value */
1952
1953 int
1954 add_one(int arg)
1955 {
1956     return arg + 1;
1957 }
1958
1959 /* by reference, fixed length */
1960
1961 float8 *
1962 add_one_float8(float8 *arg)
1963 {
1964     float8    *result = (float8 *) palloc(sizeof(float8));
1965
1966     *result = *arg + 1.0;
1967
1968     return result;
1969 }
1970
1971 Point *
1972 makepoint(Point *pointx, Point *pointy)
1973 {
1974     Point     *new_point = (Point *) palloc(sizeof(Point));
1975
1976     new_point->x = pointx->x;
1977     new_point->y = pointy->y;
1978
1979     return new_point;
1980 }
1981
1982 /* by reference, variable length */
1983
1984 text *
1985 copytext(text *t)
1986 {
1987     /*
1988      * VARSIZE is the total size of the struct in bytes.
1989      */
1990     text *new_t = (text *) palloc(VARSIZE(t));
1991     SET_VARSIZE(new_t, VARSIZE(t));
1992     /*
1993      * VARDATA is a pointer to the data region of the struct.
1994      */
1995     memcpy((void *) VARDATA(new_t), /* destination */
1996            (void *) VARDATA(t),     /* source */
1997            VARSIZE(t) - VARHDRSZ);  /* how many bytes */
1998     return new_t;
1999 }
2000
2001 text *
2002 concat_text(text *arg1, text *arg2)
2003 {
2004     int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
2005     text *new_text = (text *) palloc(new_text_size);
2006
2007     SET_VARSIZE(new_text, new_text_size);
2008     memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1) - VARHDRSZ);
2009     memcpy(VARDATA(new_text) + (VARSIZE(arg1) - VARHDRSZ),
2010            VARDATA(arg2), VARSIZE(arg2) - VARHDRSZ);
2011     return new_text;
2012 }
2013 ]]>
2014 </programlisting>
2015     </para>
2016
2017     <para>
2018      Supposing that the above code has been prepared in file
2019      <filename>funcs.c</filename> and compiled into a shared object,
2020      we could define the functions to <productname>PostgreSQL</productname>
2021      with commands like this:
2022
2023 <programlisting>
2024 CREATE FUNCTION add_one(integer) RETURNS integer
2025      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one'
2026      LANGUAGE C STRICT;
2027
2028 -- note overloading of SQL function name "add_one"
2029 CREATE FUNCTION add_one(double precision) RETURNS double precision
2030      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one_float8'
2031      LANGUAGE C STRICT;
2032
2033 CREATE FUNCTION makepoint(point, point) RETURNS point
2034      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'makepoint'
2035      LANGUAGE C STRICT;
2036
2037 CREATE FUNCTION copytext(text) RETURNS text
2038      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'copytext'
2039      LANGUAGE C STRICT;
2040
2041 CREATE FUNCTION concat_text(text, text) RETURNS text
2042      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'concat_text'
2043      LANGUAGE C STRICT;
2044 </programlisting>
2045     </para>
2046
2047     <para>
2048      Here, <replaceable>DIRECTORY</replaceable> stands for the
2049      directory of the shared library file (for instance the
2050      <productname>PostgreSQL</productname> tutorial directory, which
2051      contains the code for the examples used in this section).
2052      (Better style would be to use just <literal>'funcs'</> in the
2053      <literal>AS</> clause, after having added
2054      <replaceable>DIRECTORY</replaceable> to the search path.  In any
2055      case, we can omit the system-specific extension for a shared
2056      library, commonly <literal>.so</literal> or
2057      <literal>.sl</literal>.)
2058     </para>
2059
2060     <para>
2061      Notice that we have specified the functions as <quote>strict</quote>,
2062      meaning that
2063      the system should automatically assume a null result if any input
2064      value is null.  By doing this, we avoid having to check for null inputs
2065      in the function code.  Without this, we'd have to check for null values
2066      explicitly, by checking for a null pointer for each
2067      pass-by-reference argument.  (For pass-by-value arguments, we don't
2068      even have a way to check!)
2069     </para>
2070
2071     <para>
2072      Although this calling convention is simple to use,
2073      it is not very portable; on some architectures there are problems
2074      with passing data types that are smaller than <type>int</type> this way.  Also, there is
2075      no simple way to return a null result, nor to cope with null arguments
2076      in any way other than making the function strict.  The version-1
2077      convention, presented next, overcomes these objections.
2078     </para>
2079    </sect2>
2080
2081    <sect2>
2082     <title>Version 1 Calling Conventions</title>
2083
2084     <para>
2085      The version-1 calling convention relies on macros to suppress most
2086      of the complexity of passing arguments and results.  The C declaration
2087      of a version-1 function is always:
2088 <programlisting>
2089 Datum funcname(PG_FUNCTION_ARGS)
2090 </programlisting>
2091      In addition, the macro call:
2092 <programlisting>
2093 PG_FUNCTION_INFO_V1(funcname);
2094 </programlisting>
2095      must appear in the same source file.  (Conventionally, it's
2096      written just before the function itself.)  This macro call is not
2097      needed for <literal>internal</>-language functions, since
2098      <productname>PostgreSQL</> assumes that all internal functions
2099      use the version-1 convention.  It is, however, required for
2100      dynamically-loaded functions.
2101     </para>
2102
2103     <para>
2104      In a version-1 function, each actual argument is fetched using a
2105      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2106      macro that corresponds to the argument's data type, and the
2107      result is returned using a
2108      <function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
2109      macro for the return type.
2110      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2111      takes as its argument the number of the function argument to
2112      fetch, where the count starts at 0.
2113      <function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
2114      takes as its argument the actual value to return.
2115     </para>
2116
2117     <para>
2118      Here we show the same functions as above, coded in version-1 style:
2119
2120 <programlisting><![CDATA[
2121 #include "postgres.h"
2122 #include <string.h>
2123 #include "fmgr.h"
2124 #include "utils/geo_decls.h"
2125
2126 #ifdef PG_MODULE_MAGIC
2127 PG_MODULE_MAGIC;
2128 #endif
2129
2130 /* by value */
2131
2132 PG_FUNCTION_INFO_V1(add_one);
2133
2134 Datum
2135 add_one(PG_FUNCTION_ARGS)
2136 {
2137     int32   arg = PG_GETARG_INT32(0);
2138
2139     PG_RETURN_INT32(arg + 1);
2140 }
2141
2142 /* by reference, fixed length */
2143
2144 PG_FUNCTION_INFO_V1(add_one_float8);
2145
2146 Datum
2147 add_one_float8(PG_FUNCTION_ARGS)
2148 {
2149     /* The macros for FLOAT8 hide its pass-by-reference nature. */
2150     float8   arg = PG_GETARG_FLOAT8(0);
2151
2152     PG_RETURN_FLOAT8(arg + 1.0);
2153 }
2154
2155 PG_FUNCTION_INFO_V1(makepoint);
2156
2157 Datum
2158 makepoint(PG_FUNCTION_ARGS)
2159 {
2160     /* Here, the pass-by-reference nature of Point is not hidden. */
2161     Point     *pointx = PG_GETARG_POINT_P(0);
2162     Point     *pointy = PG_GETARG_POINT_P(1);
2163     Point     *new_point = (Point *) palloc(sizeof(Point));
2164
2165     new_point->x = pointx->x;
2166     new_point->y = pointy->y;
2167
2168     PG_RETURN_POINT_P(new_point);
2169 }
2170
2171 /* by reference, variable length */
2172
2173 PG_FUNCTION_INFO_V1(copytext);
2174
2175 Datum
2176 copytext(PG_FUNCTION_ARGS)
2177 {
2178     text     *t = PG_GETARG_TEXT_P(0);
2179     /*
2180      * VARSIZE is the total size of the struct in bytes.
2181      */
2182     text     *new_t = (text *) palloc(VARSIZE(t));
2183     SET_VARSIZE(new_t, VARSIZE(t));
2184     /*
2185      * VARDATA is a pointer to the data region of the struct.
2186      */
2187     memcpy((void *) VARDATA(new_t), /* destination */
2188            (void *) VARDATA(t),     /* source */
2189            VARSIZE(t) - VARHDRSZ);  /* how many bytes */
2190     PG_RETURN_TEXT_P(new_t);
2191 }
2192
2193 PG_FUNCTION_INFO_V1(concat_text);
2194
2195 Datum
2196 concat_text(PG_FUNCTION_ARGS)
2197 {
2198     text  *arg1 = PG_GETARG_TEXT_P(0);
2199     text  *arg2 = PG_GETARG_TEXT_P(1);
2200     int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
2201     text *new_text = (text *) palloc(new_text_size);
2202
2203     SET_VARSIZE(new_text, new_text_size);
2204     memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1) - VARHDRSZ);
2205     memcpy(VARDATA(new_text) + (VARSIZE(arg1) - VARHDRSZ),
2206            VARDATA(arg2), VARSIZE(arg2) - VARHDRSZ);
2207     PG_RETURN_TEXT_P(new_text);
2208 }
2209 ]]>
2210 </programlisting>
2211     </para>
2212
2213     <para>
2214      The <command>CREATE FUNCTION</command> commands are the same as
2215      for the version-0 equivalents.
2216     </para>
2217
2218     <para>
2219      At first glance, the version-1 coding conventions might appear to
2220      be just pointless obscurantism.  They do, however, offer a number
2221      of improvements, because the macros can hide unnecessary detail.
2222      An example is that in coding <function>add_one_float8</>, we no longer need to
2223      be aware that <type>float8</type> is a pass-by-reference type.  Another
2224      example is that the <literal>GETARG</> macros for variable-length types allow
2225      for more efficient fetching of <quote>toasted</quote> (compressed or
2226      out-of-line) values.
2227     </para>
2228
2229     <para>
2230      One big improvement in version-1 functions is better handling of null
2231      inputs and results.  The macro <function>PG_ARGISNULL(<replaceable>n</>)</function>
2232      allows a function to test whether each input is null.  (Of course, doing
2233      this is only necessary in functions not declared <quote>strict</>.)
2234      As with the
2235      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros,
2236      the input arguments are counted beginning at zero.  Note that one
2237      should refrain from executing
2238      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until
2239      one has verified that the argument isn't null.
2240      To return a null result, execute <function>PG_RETURN_NULL()</function>;
2241      this works in both strict and nonstrict functions.
2242     </para>
2243
2244     <para>
2245      Other options provided in the new-style interface are two
2246      variants of the
2247      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2248      macros. The first of these,
2249      <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>,
2250      guarantees to return a copy of the specified argument that is
2251      safe for writing into. (The normal macros will sometimes return a
2252      pointer to a value that is physically stored in a table, which
2253      must not be written to. Using the
2254      <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>
2255      macros guarantees a writable result.)
2256     The second variant consists of the
2257     <function>PG_GETARG_<replaceable>xxx</replaceable>_SLICE()</function>
2258     macros which take three arguments. The first is the number of the
2259     function argument (as above). The second and third are the offset and
2260     length of the segment to be returned. Offsets are counted from
2261     zero, and a negative length requests that the remainder of the
2262     value be returned. These macros provide more efficient access to
2263     parts of large values in the case where they have storage type
2264     <quote>external</quote>. (The storage type of a column can be specified using
2265     <literal>ALTER TABLE <replaceable>tablename</replaceable> ALTER
2266     COLUMN <replaceable>colname</replaceable> SET STORAGE
2267     <replaceable>storagetype</replaceable></literal>. <replaceable>storagetype</replaceable> is one of
2268     <literal>plain</>, <literal>external</>, <literal>extended</literal>,
2269      or <literal>main</>.)
2270     </para>
2271
2272     <para>
2273      Finally, the version-1 function call conventions make it possible
2274      to return set results (<xref linkend="xfunc-c-return-set">) and
2275      implement trigger functions (<xref linkend="triggers">) and
2276      procedural-language call handlers (<xref
2277      linkend="plhandler">).  Version-1 code is also more
2278      portable than version-0, because it does not break restrictions
2279      on function call protocol in the C standard.  For more details
2280      see <filename>src/backend/utils/fmgr/README</filename> in the
2281      source distribution.
2282     </para>
2283    </sect2>
2284
2285    <sect2>
2286     <title>Writing Code</title>
2287
2288     <para>
2289      Before we turn to the more advanced topics, we should discuss
2290      some coding rules for <productname>PostgreSQL</productname>
2291      C-language functions.  While it might be possible to load functions
2292      written in languages other than C into
2293      <productname>PostgreSQL</productname>, this is usually difficult
2294      (when it is possible at all) because other languages, such as
2295      C++, FORTRAN, or Pascal often do not follow the same calling
2296      convention as C.  That is, other languages do not pass argument
2297      and return values between functions in the same way.  For this
2298      reason, we will assume that your C-language functions are
2299      actually written in C.
2300     </para>
2301
2302     <para>
2303      The basic rules for writing and building C functions are as follows:
2304
2305      <itemizedlist>
2306       <listitem>
2307        <para>
2308         Use <literal>pg_config
2309         --includedir-server</literal><indexterm><primary>pg_config</><secondary>with user-defined C functions</></>
2310         to find out where the <productname>PostgreSQL</> server header
2311         files are installed on your system (or the system that your
2312         users will be running on).
2313        </para>
2314       </listitem>
2315
2316       <listitem>
2317        <para>
2318         Compiling and linking your code so that it can be dynamically
2319         loaded into <productname>PostgreSQL</productname> always
2320         requires special flags.  See <xref linkend="dfunc"> for a
2321         detailed explanation of how to do it for your particular
2322         operating system.
2323        </para>
2324       </listitem>
2325
2326       <listitem>
2327        <para>
2328         Remember to define a <quote>magic block</> for your shared library,
2329         as described in <xref linkend="xfunc-c-dynload">.
2330        </para>
2331       </listitem>
2332
2333       <listitem>
2334        <para>
2335         When allocating memory, use the
2336         <productname>PostgreSQL</productname> functions
2337         <function>palloc</function><indexterm><primary>palloc</></> and <function>pfree</function><indexterm><primary>pfree</></>
2338         instead of the corresponding C library functions
2339         <function>malloc</function> and <function>free</function>.
2340         The memory allocated by <function>palloc</function> will be
2341         freed automatically at the end of each transaction, preventing
2342         memory leaks.
2343        </para>
2344       </listitem>
2345
2346       <listitem>
2347        <para>
2348         Always zero the bytes of your structures using
2349         <function>memset</function>.  Without this, it's difficult to
2350         support hash indexes or hash joins, as you must pick out only
2351         the significant bits of your data structure to compute a hash.
2352         Even if you initialize all fields of your structure, there might be
2353         alignment padding (holes in the structure) that contain
2354         garbage values.
2355        </para>
2356       </listitem>
2357
2358       <listitem>
2359        <para>
2360         Most of the internal <productname>PostgreSQL</productname>
2361         types are declared in <filename>postgres.h</filename>, while
2362         the function manager interfaces
2363         (<symbol>PG_FUNCTION_ARGS</symbol>, etc.)  are in
2364         <filename>fmgr.h</filename>, so you will need to include at
2365         least these two files.  For portability reasons it's best to
2366         include <filename>postgres.h</filename> <emphasis>first</>,
2367         before any other system or user header files.  Including
2368         <filename>postgres.h</filename> will also include
2369         <filename>elog.h</filename> and <filename>palloc.h</filename>
2370         for you.
2371        </para>
2372       </listitem>
2373
2374       <listitem>
2375        <para>
2376         Symbol names defined within object files must not conflict
2377         with each other or with symbols defined in the
2378         <productname>PostgreSQL</productname> server executable.  You
2379         will have to rename your functions or variables if you get
2380         error messages to this effect.
2381        </para>
2382       </listitem>
2383      </itemizedlist>
2384     </para>
2385    </sect2>
2386
2387 &dfunc;
2388
2389    <sect2 id="xfunc-c-pgxs">
2390     <title>Extension Building Infrastructure</title>
2391
2392    <indexterm zone="xfunc-c-pgxs">
2393     <primary>pgxs</primary>
2394    </indexterm>
2395
2396    <para>
2397     If you are thinking about distributing your
2398     <productname>PostgreSQL</> extension modules, setting up a
2399     portable build system for them can be fairly difficult.  Therefore
2400     the <productname>PostgreSQL</> installation provides a build
2401     infrastructure for extensions, called <acronym>PGXS</acronym>, so
2402     that simple extension modules can be built simply against an
2403     already installed server.  Note that this infrastructure is not
2404     intended to be a universal build system framework that can be used
2405     to build all software interfacing to <productname>PostgreSQL</>;
2406     it simply automates common build rules for simple server extension
2407     modules.  For more complicated packages, you need to write your
2408     own build system.
2409    </para>
2410
2411    <para>
2412     To use the infrastructure for your extension, you must write a
2413     simple makefile.  In that makefile, you need to set some variables
2414     and finally include the global <acronym>PGXS</acronym> makefile.
2415     Here is an example that builds an extension module named
2416     <literal>isbn_issn</literal> consisting of a shared library, an
2417     SQL script, and a documentation text file:
2418 <programlisting>
2419 MODULES = isbn_issn
2420 DATA_built = isbn_issn.sql
2421 DOCS = README.isbn_issn
2422
2423 PG_CONFIG = pg_config
2424 PGXS := $(shell $(PG_CONFIG) --pgxs)
2425 include $(PGXS)
2426 </programlisting>
2427     The last three lines should always be the same.  Earlier in the
2428     file, you assign variables or add custom
2429     <application>make</application> rules.
2430    </para>
2431
2432    <para>
2433     Set one of these three variables to specify what is built:
2434
2435     <variablelist>
2436      <varlistentry>
2437       <term><varname>MODULES</varname></term>
2438       <listitem>
2439        <para>
2440         list of shared objects to be built from source files with same
2441         stem (do not include suffix in this list)
2442        </para>
2443       </listitem>
2444      </varlistentry>
2445
2446      <varlistentry>
2447       <term><varname>MODULE_big</varname></term>
2448       <listitem>
2449        <para>
2450         a shared object to build from multiple source files
2451         (list object files in <varname>OBJS</varname>)
2452        </para>
2453       </listitem>
2454      </varlistentry>
2455
2456      <varlistentry>
2457       <term><varname>PROGRAM</varname></term>
2458       <listitem>
2459        <para>
2460         a binary program to build
2461         (list object files in <varname>OBJS</varname>)
2462        </para>
2463       </listitem>
2464      </varlistentry>
2465     </variablelist>
2466
2467     The following variables can also be set:
2468
2469     <variablelist>
2470      <varlistentry>
2471       <term><varname>MODULEDIR</varname></term>
2472       <listitem>
2473        <para>
2474         subdirectory into which DATA and DOCS files should be
2475         installed (if not set, default is <literal>contrib</literal>)
2476        </para>
2477       </listitem>
2478      </varlistentry>
2479
2480      <varlistentry>
2481       <term><varname>DATA</varname></term>
2482       <listitem>
2483        <para>
2484         random files to install into <literal><replaceable>prefix</replaceable>/share/$MODULEDIR</literal>
2485        </para>
2486       </listitem>
2487      </varlistentry>
2488
2489      <varlistentry>
2490       <term><varname>DATA_built</varname></term>
2491       <listitem>
2492        <para>
2493         random files to install into
2494         <literal><replaceable>prefix</replaceable>/share/$MODULEDIR</literal>,
2495         which need to be built first
2496        </para>
2497       </listitem>
2498      </varlistentry>
2499
2500      <varlistentry>
2501       <term><varname>DATA_TSEARCH</varname></term>
2502       <listitem>
2503        <para>
2504         random files to install under
2505         <literal><replaceable>prefix</replaceable>/share/tsearch_data</literal>
2506        </para>
2507       </listitem>
2508      </varlistentry>
2509
2510      <varlistentry>
2511       <term><varname>DOCS</varname></term>
2512       <listitem>
2513        <para>
2514         random files to install under
2515         <literal><replaceable>prefix</replaceable>/doc/$MODULEDIR</literal>
2516        </para>
2517       </listitem>
2518      </varlistentry>
2519
2520      <varlistentry>
2521       <term><varname>SCRIPTS</varname></term>
2522       <listitem>
2523        <para>
2524         script files (not binaries) to install into
2525         <literal><replaceable>prefix</replaceable>/bin</literal>
2526        </para>
2527       </listitem>
2528      </varlistentry>
2529
2530      <varlistentry>
2531       <term><varname>SCRIPTS_built</varname></term>
2532       <listitem>
2533        <para>
2534         script files (not binaries) to install into
2535         <literal><replaceable>prefix</replaceable>/bin</literal>,
2536         which need to be built first
2537        </para>
2538       </listitem>
2539      </varlistentry>
2540
2541      <varlistentry>
2542       <term><varname>REGRESS</varname></term>
2543       <listitem>
2544        <para>
2545         list of regression test cases (without suffix), see below
2546        </para>
2547       </listitem>
2548      </varlistentry>
2549
2550      <varlistentry>
2551       <term><varname>EXTRA_CLEAN</varname></term>
2552       <listitem>
2553        <para>
2554         extra files to remove in <literal>make clean</literal>
2555        </para>
2556       </listitem>
2557      </varlistentry>
2558
2559      <varlistentry>
2560       <term><varname>PG_CPPFLAGS</varname></term>
2561       <listitem>
2562        <para>
2563         will be added to <varname>CPPFLAGS</varname>
2564        </para>
2565       </listitem>
2566      </varlistentry>
2567
2568      <varlistentry>
2569       <term><varname>PG_LIBS</varname></term>
2570       <listitem>
2571        <para>
2572         will be added to <varname>PROGRAM</varname> link line
2573        </para>
2574       </listitem>
2575      </varlistentry>
2576
2577      <varlistentry>
2578       <term><varname>SHLIB_LINK</varname></term>
2579       <listitem>
2580        <para>
2581         will be added to <varname>MODULE_big</varname> link line
2582        </para>
2583       </listitem>
2584      </varlistentry>
2585
2586      <varlistentry>
2587       <term><varname>PG_CONFIG</varname></term>
2588       <listitem>
2589        <para>
2590         path to <application>pg_config</> program for the
2591         <productname>PostgreSQL</productname> installation to build against
2592         (typically just <literal>pg_config</> to use the first one in your
2593         <varname>PATH</>)
2594        </para>
2595       </listitem>
2596      </varlistentry>
2597     </variablelist>
2598    </para>
2599
2600    <para>
2601     Put this makefile as <literal>Makefile</literal> in the directory
2602     which holds your extension. Then you can do
2603     <literal>make</literal> to compile, and later <literal>make
2604     install</literal> to install your module.  By default, the extension is
2605     compiled and installed for the
2606     <productname>PostgreSQL</productname> installation that
2607     corresponds to the first <command>pg_config</command> program
2608     found in your path.  You can use a different installation by
2609     setting <varname>PG_CONFIG</varname> to point to its
2610     <command>pg_config</command> program, either within the makefile
2611     or on the <literal>make</literal> command line.
2612    </para>
2613
2614    <caution>
2615     <para>
2616      Changing <varname>PG_CONFIG</varname> only works when building
2617      against <productname>PostgreSQL</productname> 8.3 or later.
2618      With older releases it does not work to set it to anything except
2619      <literal>pg_config</>; you must alter your <varname>PATH</>
2620      to select the installation to build against.
2621     </para>
2622    </caution>
2623
2624    <para>
2625     The scripts listed in the <varname>REGRESS</> variable are used for
2626     regression testing of your module, just like <literal>make
2627     installcheck</literal> is used for the main
2628     <productname>PostgreSQL</productname> server.  For this to work you need
2629     to have a subdirectory named <literal>sql/</literal> in your extension's
2630     directory, within which you put one file for each group of tests you want
2631     to run.  The files should have extension <literal>.sql</literal>, which
2632     should not be included in the <varname>REGRESS</varname> list in the
2633     makefile.  For each test there should be a file containing the expected
2634     result in a subdirectory named <literal>expected/</literal>, with extension
2635     <literal>.out</literal>.  The tests are run by executing <literal>make
2636     installcheck</literal>, and the resulting output will be compared to the
2637     expected files.  The differences will be written to the file
2638     <literal>regression.diffs</literal> in <command>diff -c</command> format.
2639     Note that trying to run a test which is missing the expected file will be
2640     reported as <quote>trouble</quote>, so make sure you have all expected
2641     files.
2642    </para>
2643
2644    <tip>
2645     <para>
2646      The easiest way of creating the expected files is creating empty files,
2647      then carefully inspecting the result files after a test run (to be found
2648      in the <literal>results/</literal> directory), and copying them to
2649      <literal>expected/</literal> if they match what you want from the test.
2650     </para>
2651
2652    </tip>
2653   </sect2>
2654
2655
2656    <sect2>
2657     <title>Composite-Type Arguments</title>
2658
2659     <para>
2660      Composite types do not have a fixed layout like C structures.
2661      Instances of a composite type can contain null fields.  In
2662      addition, composite types that are part of an inheritance
2663      hierarchy can have different fields than other members of the
2664      same inheritance hierarchy.  Therefore,
2665      <productname>PostgreSQL</productname> provides a function
2666      interface for accessing fields of composite types from C.
2667     </para>
2668
2669     <para>
2670      Suppose we want to write a function to answer the query:
2671
2672 <programlisting>
2673 SELECT name, c_overpaid(emp, 1500) AS overpaid
2674     FROM emp
2675     WHERE name = 'Bill' OR name = 'Sam';
2676 </programlisting>
2677
2678      Using call conventions version 0, we can define
2679      <function>c_overpaid</> as:
2680
2681 <programlisting><![CDATA[
2682 #include "postgres.h"
2683 #include "executor/executor.h"  /* for GetAttributeByName() */
2684
2685 #ifdef PG_MODULE_MAGIC
2686 PG_MODULE_MAGIC;
2687 #endif
2688
2689 bool
2690 c_overpaid(HeapTupleHeader t, /* the current row of emp */
2691            int32 limit)
2692 {
2693     bool isnull;
2694     int32 salary;
2695
2696     salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
2697     if (isnull)
2698         return false;
2699     return salary > limit;
2700 }
2701 ]]>
2702 </programlisting>
2703
2704      In version-1 coding, the above would look like this:
2705
2706 <programlisting><![CDATA[
2707 #include "postgres.h"
2708 #include "executor/executor.h"  /* for GetAttributeByName() */
2709
2710 #ifdef PG_MODULE_MAGIC
2711 PG_MODULE_MAGIC;
2712 #endif
2713
2714 PG_FUNCTION_INFO_V1(c_overpaid);
2715
2716 Datum
2717 c_overpaid(PG_FUNCTION_ARGS)
2718 {
2719     HeapTupleHeader  t = PG_GETARG_HEAPTUPLEHEADER(0);
2720     int32            limit = PG_GETARG_INT32(1);
2721     bool isnull;
2722     Datum salary;
2723
2724     salary = GetAttributeByName(t, "salary", &isnull);
2725     if (isnull)
2726         PG_RETURN_BOOL(false);
2727     /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */
2728
2729     PG_RETURN_BOOL(DatumGetInt32(salary) > limit);
2730 }
2731 ]]>
2732 </programlisting>
2733     </para>
2734
2735     <para>
2736      <function>GetAttributeByName</function> is the
2737      <productname>PostgreSQL</productname> system function that
2738      returns attributes out of the specified row.  It has
2739      three arguments: the argument of type <type>HeapTupleHeader</type> passed
2740      into
2741      the  function, the name of the desired attribute, and a
2742      return parameter that tells whether  the  attribute
2743      is  null.   <function>GetAttributeByName</function> returns a <type>Datum</type>
2744      value that you can convert to the proper data type by using the
2745      appropriate <function>DatumGet<replaceable>XXX</replaceable>()</function>
2746      macro.  Note that the return value is meaningless if the null flag is
2747      set; always check the null flag before trying to do anything with the
2748      result.
2749     </para>
2750
2751     <para>
2752      There is also <function>GetAttributeByNum</function>, which selects
2753      the target attribute by column number instead of name.
2754     </para>
2755
2756     <para>
2757      The following command declares the function
2758      <function>c_overpaid</function> in SQL:
2759
2760 <programlisting>
2761 CREATE FUNCTION c_overpaid(emp, integer) RETURNS boolean
2762     AS '<replaceable>DIRECTORY</replaceable>/funcs', 'c_overpaid'
2763     LANGUAGE C STRICT;
2764 </programlisting>
2765
2766      Notice we have used <literal>STRICT</> so that we did not have to
2767      check whether the input arguments were NULL.
2768     </para>
2769    </sect2>
2770
2771    <sect2>
2772     <title>Returning Rows (Composite Types)</title>
2773
2774     <para>
2775      To return a row or composite-type value from a C-language
2776      function, you can use a special API that provides macros and
2777      functions to hide most of the complexity of building composite
2778      data types.  To use this API, the source file must include:
2779 <programlisting>
2780 #include "funcapi.h"
2781 </programlisting>
2782     </para>
2783
2784     <para>
2785      There are two ways you can build a composite data value (henceforth
2786      a <quote>tuple</>): you can build it from an array of Datum values,
2787      or from an array of C strings that can be passed to the input
2788      conversion functions of the tuple's column data types.  In either
2789      case, you first need to obtain or construct a <structname>TupleDesc</>
2790      descriptor for the tuple structure.  When working with Datums, you
2791      pass the <structname>TupleDesc</> to <function>BlessTupleDesc</>,
2792      and then call <function>heap_form_tuple</> for each row.  When working
2793      with C strings, you pass the <structname>TupleDesc</> to
2794      <function>TupleDescGetAttInMetadata</>, and then call
2795      <function>BuildTupleFromCStrings</> for each row.  In the case of a
2796      function returning a set of tuples, the setup steps can all be done
2797      once during the first call of the function.
2798     </para>
2799
2800     <para>
2801      Several helper functions are available for setting up the needed
2802      <structname>TupleDesc</>.  The recommended way to do this in most
2803      functions returning composite values is to call:
2804 <programlisting>
2805 TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
2806                                    Oid *resultTypeId,
2807                                    TupleDesc *resultTupleDesc)
2808 </programlisting>
2809      passing the same <literal>fcinfo</> struct passed to the calling function
2810      itself.  (This of course requires that you use the version-1
2811      calling conventions.)  <varname>resultTypeId</> can be specified
2812      as <literal>NULL</> or as the address of a local variable to receive the
2813      function's result type OID.  <varname>resultTupleDesc</> should be the
2814      address of a local <structname>TupleDesc</> variable.  Check that the
2815      result is <literal>TYPEFUNC_COMPOSITE</>; if so,
2816      <varname>resultTupleDesc</> has been filled with the needed
2817      <structname>TupleDesc</>.  (If it is not, you can report an error along
2818      the lines of <quote>function returning record called in context that
2819      cannot accept type record</quote>.)
2820     </para>
2821
2822     <tip>
2823      <para>
2824       <function>get_call_result_type</> can resolve the actual type of a
2825       polymorphic function result; so it is useful in functions that return
2826       scalar polymorphic results, not only functions that return composites.
2827       The <varname>resultTypeId</> output is primarily useful for functions
2828       returning polymorphic scalars.
2829      </para>
2830     </tip>
2831
2832     <note>
2833      <para>
2834       <function>get_call_result_type</> has a sibling
2835       <function>get_expr_result_type</>, which can be used to resolve the
2836       expected output type for a function call represented by an expression
2837       tree.  This can be used when trying to determine the result type from
2838       outside the function itself.  There is also
2839       <function>get_func_result_type</>, which can be used when only the
2840       function's OID is available.  However these functions are not able
2841       to deal with functions declared to return <structname>record</>, and
2842       <function>get_func_result_type</> cannot resolve polymorphic types,
2843       so you should preferentially use <function>get_call_result_type</>.
2844      </para>
2845     </note>
2846
2847     <para>
2848      Older, now-deprecated functions for obtaining
2849      <structname>TupleDesc</>s are:
2850 <programlisting>
2851 TupleDesc RelationNameGetTupleDesc(const char *relname)
2852 </programlisting>
2853      to get a <structname>TupleDesc</> for the row type of a named relation,
2854      and:
2855 <programlisting>
2856 TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
2857 </programlisting>
2858      to get a <structname>TupleDesc</> based on a type OID. This can
2859      be used to get a <structname>TupleDesc</> for a base or
2860      composite type.  It will not work for a function that returns
2861      <structname>record</>, however, and it cannot resolve polymorphic
2862      types.
2863     </para>
2864
2865     <para>
2866      Once you have a <structname>TupleDesc</>, call:
2867 <programlisting>
2868 TupleDesc BlessTupleDesc(TupleDesc tupdesc)
2869 </programlisting>
2870      if you plan to work with Datums, or:
2871 <programlisting>
2872 AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
2873 </programlisting>
2874      if you plan to work with C strings.  If you are writing a function
2875      returning set, you can save the results of these functions in the
2876      <structname>FuncCallContext</> structure &mdash; use the
2877      <structfield>tuple_desc</> or <structfield>attinmeta</> field
2878      respectively.
2879     </para>
2880
2881     <para>
2882      When working with Datums, use:
2883 <programlisting>
2884 HeapTuple heap_form_tuple(TupleDesc tupdesc, Datum *values, bool *isnull)
2885 </programlisting>
2886      to build a <structname>HeapTuple</> given user data in Datum form.
2887     </para>
2888
2889     <para>
2890      When working with C strings, use:
2891 <programlisting>
2892 HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
2893 </programlisting>
2894      to build a <structname>HeapTuple</> given user data
2895      in C string form.  <literal>values</literal> is an array of C strings,
2896      one for each attribute of the return row. Each C string should be in
2897      the form expected by the input function of the attribute data
2898      type. In order to return a null value for one of the attributes,
2899      the corresponding pointer in the <parameter>values</> array
2900      should be set to <symbol>NULL</>.  This function will need to
2901      be called again for each row you return.
2902     </para>
2903
2904     <para>
2905      Once you have built a tuple to return from your function, it
2906      must be converted into a <type>Datum</>. Use:
2907 <programlisting>
2908 HeapTupleGetDatum(HeapTuple tuple)
2909 </programlisting>
2910      to convert a <structname>HeapTuple</> into a valid Datum.  This
2911      <type>Datum</> can be returned directly if you intend to return
2912      just a single row, or it can be used as the current return value
2913      in a set-returning function.
2914     </para>
2915
2916     <para>
2917      An example appears in the next section.
2918     </para>
2919
2920    </sect2>
2921
2922    <sect2 id="xfunc-c-return-set">
2923     <title>Returning Sets</title>
2924
2925     <para>
2926      There is also a special API that provides support for returning
2927      sets (multiple rows) from a C-language function.  A set-returning
2928      function must follow the version-1 calling conventions.  Also,
2929      source files must include <filename>funcapi.h</filename>, as
2930      above.
2931     </para>
2932
2933     <para>
2934      A set-returning function (<acronym>SRF</>) is called
2935      once for each item it returns.  The <acronym>SRF</> must
2936      therefore save enough state to remember what it was doing and
2937      return the next item on each call.
2938      The structure <structname>FuncCallContext</> is provided to help
2939      control this process.  Within a function, <literal>fcinfo-&gt;flinfo-&gt;fn_extra</>
2940      is used to hold a pointer to <structname>FuncCallContext</>
2941      across calls.
2942 <programlisting>
2943 typedef struct
2944 {
2945     /*
2946      * Number of times we've been called before
2947      *
2948      * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
2949      * incremented for you every time SRF_RETURN_NEXT() is called.
2950      */
2951     uint32 call_cntr;
2952
2953     /*
2954      * OPTIONAL maximum number of calls
2955      *
2956      * max_calls is here for convenience only and setting it is optional.
2957      * If not set, you must provide alternative means to know when the
2958      * function is done.
2959      */
2960     uint32 max_calls;
2961
2962     /*
2963      * OPTIONAL pointer to result slot
2964      *
2965      * This is obsolete and only present for backwards compatibility, viz,
2966      * user-defined SRFs that use the deprecated TupleDescGetSlot().
2967      */
2968     TupleTableSlot *slot;
2969
2970     /*
2971      * OPTIONAL pointer to miscellaneous user-provided context information
2972      *
2973      * user_fctx is for use as a pointer to your own data to retain
2974      * arbitrary context information between calls of your function.
2975      */
2976     void *user_fctx;
2977
2978     /*
2979      * OPTIONAL pointer to struct containing attribute type input metadata
2980      *
2981      * attinmeta is for use when returning tuples (i.e., composite data types)
2982      * and is not used when returning base data types. It is only needed
2983      * if you intend to use BuildTupleFromCStrings() to create the return
2984      * tuple.
2985      */
2986     AttInMetadata *attinmeta;
2987
2988     /*
2989      * memory context used for structures that must live for multiple calls
2990      *
2991      * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
2992      * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
2993      * context for any memory that is to be reused across multiple calls
2994      * of the SRF.
2995      */
2996     MemoryContext multi_call_memory_ctx;
2997
2998     /*
2999      * OPTIONAL pointer to struct containing tuple description
3000      *
3001      * tuple_desc is for use when returning tuples (i.e., composite data types)
3002      * and is only needed if you are going to build the tuples with
3003      * heap_form_tuple() rather than with BuildTupleFromCStrings().  Note that
3004      * the TupleDesc pointer stored here should usually have been run through
3005      * BlessTupleDesc() first.
3006      */
3007     TupleDesc tuple_desc;
3008
3009 } FuncCallContext;
3010 </programlisting>
3011     </para>
3012
3013     <para>
3014      An <acronym>SRF</> uses several functions and macros that
3015      automatically manipulate the <structname>FuncCallContext</>
3016      structure (and expect to find it via <literal>fn_extra</>).  Use:
3017 <programlisting>
3018 SRF_IS_FIRSTCALL()
3019 </programlisting>
3020      to determine if your function is being called for the first or a
3021      subsequent time. On the first call (only) use:
3022 <programlisting>
3023 SRF_FIRSTCALL_INIT()
3024 </programlisting>
3025      to initialize the <structname>FuncCallContext</>. On every function call,
3026      including the first, use:
3027 <programlisting>
3028 SRF_PERCALL_SETUP()
3029 </programlisting>
3030      to properly set up for using the <structname>FuncCallContext</>
3031      and clearing any previously returned data left over from the
3032      previous pass.
3033     </para>
3034
3035     <para>
3036      If your function has data to return, use:
3037 <programlisting>
3038 SRF_RETURN_NEXT(funcctx, result)
3039 </programlisting>
3040      to return it to the caller.  (<literal>result</> must be of type
3041      <type>Datum</>, either a single value or a tuple prepared as
3042      described above.)  Finally, when your function is finished
3043      returning data, use:
3044 <programlisting>
3045 SRF_RETURN_DONE(funcctx)
3046 </programlisting>
3047      to clean up and end the <acronym>SRF</>.
3048     </para>
3049
3050     <para>
3051      The memory context that is current when the <acronym>SRF</> is called is
3052      a transient context that will be cleared between calls.  This means
3053      that you do not need to call <function>pfree</> on everything
3054      you allocated using <function>palloc</>; it will go away anyway.  However, if you want to allocate
3055      any data structures to live across calls, you need to put them somewhere
3056      else.  The memory context referenced by
3057      <structfield>multi_call_memory_ctx</> is a suitable location for any
3058      data that needs to survive until the <acronym>SRF</> is finished running.  In most
3059      cases, this means that you should switch into
3060      <structfield>multi_call_memory_ctx</> while doing the first-call setup.
3061     </para>
3062
3063     <para>
3064      A complete pseudo-code example looks like the following:
3065 <programlisting>
3066 Datum
3067 my_set_returning_function(PG_FUNCTION_ARGS)
3068 {
3069     FuncCallContext  *funcctx;
3070     Datum             result;
3071     <replaceable>further declarations as needed</replaceable>
3072
3073     if (SRF_IS_FIRSTCALL())
3074     {
3075         MemoryContext oldcontext;
3076
3077         funcctx = SRF_FIRSTCALL_INIT();
3078         oldcontext = MemoryContextSwitchTo(funcctx-&gt;multi_call_memory_ctx);
3079         /* One-time setup code appears here: */
3080         <replaceable>user code</replaceable>
3081         <replaceable>if returning composite</replaceable>
3082             <replaceable>build TupleDesc, and perhaps AttInMetadata</replaceable>
3083         <replaceable>endif returning composite</replaceable>
3084         <replaceable>user code</replaceable>
3085         MemoryContextSwitchTo(oldcontext);
3086     }
3087
3088     /* Each-time setup code appears here: */
3089     <replaceable>user code</replaceable>
3090     funcctx = SRF_PERCALL_SETUP();
3091     <replaceable>user code</replaceable>
3092
3093     /* this is just one way we might test whether we are done: */
3094     if (funcctx-&gt;call_cntr &lt; funcctx-&gt;max_calls)
3095     {
3096         /* Here we want to return another item: */
3097         <replaceable>user code</replaceable>
3098         <replaceable>obtain result Datum</replaceable>
3099         SRF_RETURN_NEXT(funcctx, result);
3100     }
3101     else
3102     {
3103         /* Here we are done returning items and just need to clean up: */
3104         <replaceable>user code</replaceable>
3105         SRF_RETURN_DONE(funcctx);
3106     }
3107 }
3108 </programlisting>
3109     </para>
3110
3111     <para>
3112      A complete example of a simple <acronym>SRF</> returning a composite type
3113      looks like:
3114 <programlisting><![CDATA[
3115 PG_FUNCTION_INFO_V1(retcomposite);
3116
3117 Datum
3118 retcomposite(PG_FUNCTION_ARGS)
3119 {
3120     FuncCallContext     *funcctx;
3121     int                  call_cntr;
3122     int                  max_calls;
3123     TupleDesc            tupdesc;
3124     AttInMetadata       *attinmeta;
3125
3126     /* stuff done only on the first call of the function */
3127     if (SRF_IS_FIRSTCALL())
3128     {
3129         MemoryContext   oldcontext;
3130
3131         /* create a function context for cross-call persistence */
3132         funcctx = SRF_FIRSTCALL_INIT();
3133
3134         /* switch to memory context appropriate for multiple function calls */
3135         oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
3136
3137         /* total number of tuples to be returned */
3138         funcctx->max_calls = PG_GETARG_UINT32(0);
3139
3140         /* Build a tuple descriptor for our result type */
3141         if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
3142             ereport(ERROR,
3143                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
3144                      errmsg("function returning record called in context "
3145                             "that cannot accept type record")));
3146
3147         /*
3148          * generate attribute metadata needed later to produce tuples from raw
3149          * C strings
3150          */
3151         attinmeta = TupleDescGetAttInMetadata(tupdesc);
3152         funcctx->attinmeta = attinmeta;
3153
3154         MemoryContextSwitchTo(oldcontext);
3155     }
3156
3157     /* stuff done on every call of the function */
3158     funcctx = SRF_PERCALL_SETUP();
3159
3160     call_cntr = funcctx->call_cntr;
3161     max_calls = funcctx->max_calls;
3162     attinmeta = funcctx->attinmeta;
3163
3164     if (call_cntr < max_calls)    /* do when there is more left to send */
3165     {
3166         char       **values;
3167         HeapTuple    tuple;
3168         Datum        result;
3169
3170         /*
3171          * Prepare a values array for building the returned tuple.
3172          * This should be an array of C strings which will
3173          * be processed later by the type input functions.
3174          */
3175         values = (char **) palloc(3 * sizeof(char *));
3176         values[0] = (char *) palloc(16 * sizeof(char));
3177         values[1] = (char *) palloc(16 * sizeof(char));
3178         values[2] = (char *) palloc(16 * sizeof(char));
3179
3180         snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
3181         snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1));
3182         snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1));
3183
3184         /* build a tuple */
3185         tuple = BuildTupleFromCStrings(attinmeta, values);
3186
3187         /* make the tuple into a datum */
3188         result = HeapTupleGetDatum(tuple);
3189
3190         /* clean up (this is not really necessary) */
3191         pfree(values[0]);
3192         pfree(values[1]);
3193         pfree(values[2]);
3194         pfree(values);
3195
3196         SRF_RETURN_NEXT(funcctx, result);
3197     }
3198     else    /* do when there is no more left */
3199     {
3200         SRF_RETURN_DONE(funcctx);
3201     }
3202 }
3203 ]]>
3204 </programlisting>
3205
3206      One way to declare this function in SQL is:
3207 <programlisting>
3208 CREATE TYPE __retcomposite AS (f1 integer, f2 integer, f3 integer);
3209
3210 CREATE OR REPLACE FUNCTION retcomposite(integer, integer)
3211     RETURNS SETOF __retcomposite
3212     AS '<replaceable>filename</>', 'retcomposite'
3213     LANGUAGE C IMMUTABLE STRICT;
3214 </programlisting>
3215      A different way is to use OUT parameters:
3216 <programlisting>
3217 CREATE OR REPLACE FUNCTION retcomposite(IN integer, IN integer,
3218     OUT f1 integer, OUT f2 integer, OUT f3 integer)
3219     RETURNS SETOF record
3220     AS '<replaceable>filename</>', 'retcomposite'
3221     LANGUAGE C IMMUTABLE STRICT;
3222 </programlisting>
3223      Notice that in this method the output type of the function is formally
3224      an anonymous <structname>record</> type.
3225     </para>
3226
3227     <para>
3228      The directory <filename>contrib/tablefunc</> in the source
3229      distribution contains more examples of set-returning functions.
3230     </para>
3231    </sect2>
3232
3233    <sect2>
3234     <title>Polymorphic Arguments and Return Types</title>
3235
3236     <para>
3237      C-language functions can be declared to accept and
3238      return the polymorphic types
3239      <type>anyelement</type>, <type>anyarray</type>, <type>anynonarray</type>,
3240      and <type>anyenum</type>.
3241      See <xref linkend="extend-types-polymorphic"> for a more detailed explanation
3242      of polymorphic functions. When function arguments or return types
3243      are defined as polymorphic types, the function author cannot know
3244      in advance what data type it will be called with, or
3245      need to return. There are two routines provided in <filename>fmgr.h</>
3246      to allow a version-1 C function to discover the actual data types
3247      of its arguments and the type it is expected to return. The routines are
3248      called <literal>get_fn_expr_rettype(FmgrInfo *flinfo)</> and
3249      <literal>get_fn_expr_argtype(FmgrInfo *flinfo, int argnum)</>.
3250      They return the result or argument type OID, or <symbol>InvalidOid</symbol> if the
3251      information is not available.
3252      The structure <literal>flinfo</> is normally accessed as
3253      <literal>fcinfo-&gt;flinfo</>. The parameter <literal>argnum</>
3254      is zero based.  <function>get_call_result_type</> can also be used
3255      as an alternative to <function>get_fn_expr_rettype</>.
3256     </para>
3257
3258     <para>
3259      For example, suppose we want to write a function to accept a single
3260      element of any type, and return a one-dimensional array of that type:
3261
3262 <programlisting>
3263 PG_FUNCTION_INFO_V1(make_array);
3264 Datum
3265 make_array(PG_FUNCTION_ARGS)
3266 {
3267     ArrayType  *result;
3268     Oid         element_type = get_fn_expr_argtype(fcinfo-&gt;flinfo, 0);
3269     Datum       element;
3270     bool        isnull;
3271     int16       typlen;
3272     bool        typbyval;
3273     char        typalign;
3274     int         ndims;
3275     int         dims[MAXDIM];
3276     int         lbs[MAXDIM];
3277
3278     if (!OidIsValid(element_type))
3279         elog(ERROR, "could not determine data type of input");
3280
3281     /* get the provided element, being careful in case it's NULL */
3282     isnull = PG_ARGISNULL(0);
3283     if (isnull)
3284         element = (Datum) 0;
3285     else
3286         element = PG_GETARG_DATUM(0);
3287
3288     /* we have one dimension */
3289     ndims = 1;
3290     /* and one element */
3291     dims[0] = 1;
3292     /* and lower bound is 1 */
3293     lbs[0] = 1;
3294
3295     /* get required info about the element type */
3296     get_typlenbyvalalign(element_type, &amp;typlen, &amp;typbyval, &amp;typalign);
3297
3298     /* now build the array */
3299     result = construct_md_array(&amp;element, &amp;isnull, ndims, dims, lbs,
3300                                 element_type, typlen, typbyval, typalign);
3301
3302     PG_RETURN_ARRAYTYPE_P(result);
3303 }
3304 </programlisting>
3305     </para>
3306
3307     <para>
3308      The following command declares the function
3309      <function>make_array</function> in SQL:
3310
3311 <programlisting>
3312 CREATE FUNCTION make_array(anyelement) RETURNS anyarray
3313     AS '<replaceable>DIRECTORY</replaceable>/funcs', 'make_array'
3314     LANGUAGE C IMMUTABLE;
3315 </programlisting>
3316     </para>
3317
3318     <para>
3319      There is a variant of polymorphism that is only available to C-language
3320      functions: they can be declared to take parameters of type
3321      <literal>"any"</>.  (Note that this type name must be double-quoted,
3322      since it's also a SQL reserved word.)  This works like
3323      <type>anyelement</> except that it does not constrain different
3324      <literal>"any"</> arguments to be the same type, nor do they help
3325      determine the function's result type.  A C-language function can also
3326      declare its final parameter to be <literal>VARIADIC "any"</>.  This will
3327      match one or more actual arguments of any type (not necessarily the same
3328      type).  These arguments will <emphasis>not</> be gathered into an array
3329      as happens with normal variadic functions; they will just be passed to
3330      the function separately.  The <function>PG_NARGS()</> macro and the
3331      methods described above must be used to determine the number of actual
3332      arguments and their types when using this feature.
3333     </para>
3334    </sect2>
3335
3336    <sect2>
3337     <title>Shared Memory and LWLocks</title>
3338
3339     <para>
3340      Add-ins can reserve LWLocks and an allocation of shared memory on server
3341      startup.  The add-in's shared library must be preloaded by specifying
3342      it in
3343      <xref linkend="guc-shared-preload-libraries"><indexterm><primary>shared-preload-libraries</></>.
3344      Shared memory is reserved by calling:
3345 <programlisting>
3346 void RequestAddinShmemSpace(int size)
3347 </programlisting>
3348      from your <function>_PG_init</> function.
3349     </para>
3350     <para>
3351      LWLocks are reserved by calling:
3352 <programlisting>
3353 void RequestAddinLWLocks(int n)
3354 </programlisting>
3355      from <function>_PG_init</>.
3356     </para>
3357     <para>
3358      To avoid possible race-conditions, each backend should use the LWLock
3359      <function>AddinShmemInitLock</> when connecting to and initializing
3360      its allocation of shared memory, as shown here:
3361 <programlisting>
3362         static mystruct *ptr = NULL;
3363
3364         if (!ptr)
3365         {
3366                 bool    found;
3367
3368                 LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
3369                 ptr = ShmemInitStruct("my struct name", size, &amp;found);
3370                 if (!found)
3371                 {
3372                         initialize contents of shmem area;
3373                         acquire any requested LWLocks using:
3374                         ptr->mylockid = LWLockAssign();
3375                 }
3376                 LWLockRelease(AddinShmemInitLock);
3377         }
3378 </programlisting>
3379     </para>
3380    </sect2>
3381   </sect1>