<para>
To install PL/Python in a particular database, use
- <literal>CREATE EXTENSION plpythonu</>, or from the shell command line use
- <literal>createlang plpythonu <replaceable>dbname</></literal> (but
+ <literal>CREATE EXTENSION plpythonu</> (but
see also <xref linkend="plpython-python23">).
</para>
</tip>
<para>
- As of <productname>PostgreSQL</productname> 7.4, PL/Python is only
- available as an <quote>untrusted</> language, meaning it does not
- offer any way of restricting what users can do in it. It has
- therefore been renamed to <literal>plpythonu</>. The trusted
- variant <literal>plpython</> might become available again in future,
- if a new secure execution mechanism is developed in Python. The
+ PL/Python is only available as an <quote>untrusted</> language, meaning
+ it does not offer any way of restricting what users can do in it and
+ is therefore named <literal>plpythonu</>. A trusted
+ variant <literal>plpython</> might become available in the future
+ if a secure execution mechanism is developed in Python. The
writer of a function in untrusted PL/Python must take care that the
function cannot be used to do anything unwanted, since it will be
able to do anything that could be done by a user logged in as the
</listitem>
</itemizedlist>
+ This scheme is analogous to the recommendations in <ulink
+ url="http://www.python.org/dev/peps/pep-0394/">PEP 394</ulink> regarding the
+ naming and transitioning of the <command>python</command> command.
+ </para>
+
+ <para>
It depends on the build configuration or the installed packages
whether PL/Python for Python 2 or Python 3 or both are available.
</para>
<sect2>
<title>Data Type Mapping</title>
<para>
- Function arguments are converted from their PostgreSQL type to a
- corresponding Python type:
+ When a PL/Python function is called, its arguments are converted from
+ their PostgreSQL data type to a corresponding Python type:
+
<itemizedlist>
<listitem>
<para>
<para>
PostgreSQL <type>smallint</type> and <type>int</type> are
converted to Python <type>int</type>.
- PostgreSQL <type>bigint</type> is converted
+ PostgreSQL <type>bigint</type> and <type>oid</type> are converted
to <type>long</type> in Python 2 and to <type>int</type> in
Python 3.
</para>
<listitem>
<para>
- PostgreSQL <type>real</type>, <type>double</type>,
- and <type>numeric</type> are converted to
- Python <type>float</type>. Note that for
- the <type>numeric</type> this loses information and can lead to
- incorrect results. This might be fixed in a future
- release.
+ PostgreSQL <type>real</type> and <type>double</type> are converted to
+ Python <type>float</type>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ PostgreSQL <type>numeric</type> is converted to
+ Python <type>Decimal</type>. This type is imported from
+ the <literal>cdecimal</literal> package if that is available.
+ Otherwise,
+ <literal>decimal.Decimal</literal> from the standard library will be
+ used. <literal>cdecimal</literal> is significantly faster
+ than <literal>decimal</literal>. In Python 3.3 and up,
+ however, <literal>cdecimal</literal> has been integrated into the
+ standard library under the name <literal>decimal</literal>, so there is
+ no longer any difference.
</para>
</listitem>
</para>
<para>
- Function return values are converted to the declared PostgreSQL
- return data type as follows:
+ When a PL/Python function returns, its return value is converted to the
+ function's declared PostgreSQL return data type as follows:
+
<itemizedlist>
<listitem>
<para>
When the PostgreSQL return type is <type>bytea</type>, the
return value will be converted to a string (Python 2) or bytes
(Python 3) using the respective Python built-ins, with the
- result being converted <type>bytea</type>.
+ result being converted to <type>bytea</type>.
</para>
</listitem>
<listitem>
<para>
- For all other PostgreSQL return types, the returned Python
- value is converted to a string using the Python
- built-in <literal>str</literal>, and the result is passed to the
- input function of the PostgreSQL data type.
+ For all other PostgreSQL return types, the return value is converted
+ to a string using the Python built-in <literal>str</literal>, and the
+ result is passed to the input function of the PostgreSQL data type.
+ (If the Python value is a <type>float</>, it is converted using
+ the <literal>repr</> built-in instead of <literal>str</literal>, to
+ avoid loss of precision.)
</para>
<para>
<para>
SQL array values are passed into PL/Python as a Python list. To
return an SQL array value out of a PL/Python function, return a
- Python sequence, for example a list or tuple:
+ Python list:
<programlisting>
CREATE FUNCTION return_arr()
RETURNS int[]
AS $$
-return (1, 2, 3, 4, 5)
+return [1, 2, 3, 4, 5]
$$ LANGUAGE plpythonu;
SELECT return_arr();
(1 row)
</programlisting>
+ Multidimensional arrays are passed into PL/Python as nested Python lists.
+ A 2-dimensional array is a list of lists, for example. When returning
+ a multi-dimensional SQL array out of a PL/Python function, the inner
+ lists at each level must all be of the same size. For example:
+
+<programlisting>
+CREATE FUNCTION test_type_conversion_array_int4(x int4[]) RETURNS int4[] AS $$
+plpy.info(x, type(x))
+return x
+$$ LANGUAGE plpythonu;
+
+SELECT * FROM test_type_conversion_array_int4(ARRAY[[1,2,3],[4,5,6]]);
+INFO: ([[1, 2, 3], [4, 5, 6]], <type 'list'>)
+ test_type_conversion_array_int4
+---------------------------------
+ {{1,2,3},{4,5,6}}
+(1 row)
+</programlisting>
+
+ Other Python sequences, like tuples, are also accepted for
+ backwards-compatibility with PostgreSQL versions 9.6 and below, when
+ multi-dimensional arrays were not supported. However, they are always
+ treated as one-dimensional arrays, because they are ambiguous with
+ composite types. For the same reason, when a composite type is used in a
+ multi-dimensional array, it must be represented by a tuple, rather than a
+ list.
+ </para>
+ <para>
Note that in Python, strings are sequences, which can have
undesirable effects that might be familiar to Python programmers:
CREATE FUNCTION make_pair (name text, value integer)
RETURNS named_value
AS $$
- return [ name, value ]
- # or alternatively, as tuple: return ( name, value )
+ return ( name, value )
+ # or alternatively, as tuple: return [ name, value ]
$$ LANGUAGE plpythonu;
</programlisting>
To return a SQL null for any column, insert <symbol>None</symbol> at
the corresponding position.
</para>
+ <para>
+ When an array of composite types is returned, it cannot be returned as a list,
+ because it is ambiguous whether the Python list represents a composite type,
+ or another array dimension.
+ </para>
</listitem>
</varlistentry>
$$ LANGUAGE plpythonu;
</programlisting>
- <warning>
- <para>
- Due to Python
- <ulink url="http://bugs.python.org/issue1483133">bug #1483133</ulink>,
- some debug versions of Python 2.4
- (configured and compiled with option <literal>--with-pydebug</literal>)
- are known to crash the <productname>PostgreSQL</productname> server
- when using an iterator to return a set result.
- Unpatched versions of Fedora 4 contain this bug.
- It does not happen in production versions of Python or on patched
- versions of Fedora 4.
- </para>
- </warning>
</para>
</listitem>
</varlistentry>
<title>Database Access Functions</title>
<para>
- The <literal>plpy</literal> module provides two
- functions called <function>execute</function> and
- <function>prepare</function>. Calling
- <function>plpy.execute</function> with a query string and an
- optional limit argument causes that query to be run and the result
- to be returned in a result object. The result object emulates a
- list or dictionary object. The result object can be accessed by
- row number and column name. It has these additional methods:
- <function>nrows</function> which returns the number of rows
- returned by the query, and <function>status</function> which is the
- <function>SPI_execute()</function> return value. The result object
- can be modified.
+ The <literal>plpy</literal> module provides several functions to execute
+ database commands:
</para>
- <para>
- For example:
+ <variablelist>
+ <varlistentry>
+ <term><literal>plpy.<function>execute</function>(<replaceable>query</replaceable> [, <replaceable>max-rows</replaceable>])</literal></term>
+ <listitem>
+ <para>
+ Calling <function>plpy.execute</function> with a query string and an
+ optional row limit argument causes that query to be run and the result to
+ be returned in a result object.
+ </para>
+
+ <para>
+ The result object emulates a list or dictionary object. The result
+ object can be accessed by row number and column name. For example:
<programlisting>
rv = plpy.execute("SELECT * FROM my_table", 5)
</programlisting>
- returns up to 5 rows from <literal>my_table</literal>. If
- <literal>my_table</literal> has a column
- <literal>my_column</literal>, it would be accessed as:
+ returns up to 5 rows from <literal>my_table</literal>. If
+ <literal>my_table</literal> has a column
+ <literal>my_column</literal>, it would be accessed as:
<programlisting>
foo = rv[i]["my_column"]
</programlisting>
- </para>
+ The number of rows returned can be obtained using the built-in
+ <function>len</function> function.
+ </para>
- <para>
- <indexterm><primary>preparing a query</><secondary>in PL/Python</></indexterm>
- The second function, <function>plpy.prepare</function>, prepares
- the execution plan for a query. It is called with a query string
- and a list of parameter types, if you have parameter references in
- the query. For example:
+ <para>
+ The result object has these additional methods:
+ <variablelist>
+ <varlistentry>
+ <term><literal><function>nrows</function>()</literal></term>
+ <listitem>
+ <para>
+ Returns the number of rows processed by the command. Note that this
+ is not necessarily the same as the number of rows returned. For
+ example, an <command>UPDATE</command> command will set this value but
+ won't return any rows (unless <literal>RETURNING</literal> is used).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal><function>status</function>()</literal></term>
+ <listitem>
+ <para>
+ The <function>SPI_execute()</function> return value.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal><function>colnames</function>()</literal></term>
+ <term><literal><function>coltypes</function>()</literal></term>
+ <term><literal><function>coltypmods</function>()</literal></term>
+ <listitem>
+ <para>
+ Return a list of column names, list of column type OIDs, and list of
+ type-specific type modifiers for the columns, respectively.
+ </para>
+
+ <para>
+ These methods raise an exception when called on a result object from
+ a command that did not produce a result set, e.g.,
+ <command>UPDATE</command> without <literal>RETURNING</literal>, or
+ <command>DROP TABLE</command>. But it is OK to use these methods on
+ a result set containing zero rows.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal><function>__str__</function>()</literal></term>
+ <listitem>
+ <para>
+ The standard <literal>__str__</literal> method is defined so that it
+ is possible for example to debug query execution results
+ using <literal>plpy.debug(rv)</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The result object can be modified.
+ </para>
+
+ <para>
+ Note that calling <literal>plpy.execute</literal> will cause the entire
+ result set to be read into memory. Only use that function when you are
+ sure that the result set will be relatively small. If you don't want to
+ risk excessive memory usage when fetching large results,
+ use <literal>plpy.cursor</literal> rather
+ than <literal>plpy.execute</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>plpy.<function>prepare</function>(<replaceable>query</replaceable> [, <replaceable>argtypes</replaceable>])</literal></term>
+ <term><literal>plpy.<function>execute</function>(<replaceable>plan</replaceable> [, <replaceable>arguments</replaceable> [, <replaceable>max-rows</replaceable>]])</literal></term>
+ <listitem>
+ <para>
+ <indexterm><primary>preparing a query</><secondary>in PL/Python</></indexterm>
+ <function>plpy.prepare</function> prepares the execution plan for a
+ query. It is called with a query string and a list of parameter types,
+ if you have parameter references in the query. For example:
<programlisting>
-plan = plpy.prepare("SELECT last_name FROM my_users WHERE first_name = $1", [ "text" ])
+plan = plpy.prepare("SELECT last_name FROM my_users WHERE first_name = $1", ["text"])
</programlisting>
- <literal>text</literal> is the type of the variable you will be
- passing for <literal>$1</literal>. After preparing a statement, you
- use the function <function>plpy.execute</function> to run it:
+ <literal>text</literal> is the type of the variable you will be passing
+ for <literal>$1</literal>. The second argument is optional if you don't
+ want to pass any parameters to the query.
+ </para>
+ <para>
+ After preparing a statement, you use a variant of the
+ function <function>plpy.execute</function> to run it:
<programlisting>
-rv = plpy.execute(plan, [ "name" ], 5)
+rv = plpy.execute(plan, ["name"], 5)
</programlisting>
- The third argument is the limit and is optional.
- </para>
+ Pass the plan as the first argument (instead of the query string), and a
+ list of values to substitute into the query as the second argument. The
+ second argument is optional if the query does not expect any parameters.
+ The third argument is the optional row limit as before.
+ </para>
- <para>
- Query parameters and result row fields are converted between
- PostgreSQL and Python data types as described
- in <xref linkend="plpython-data">. The exception is that composite
- types are currently not supported: They will be rejected as query
- parameters and are converted to strings when appearing in a query
- result. As a workaround for the latter problem, the query can
- sometimes be rewritten so that the composite type result appears as
- a result row rather than as a field of the result row.
- Alternatively, the resulting string could be parsed apart by hand,
- but this approach is not recommended because it is not
- future-proof.
- </para>
+ <para>
+ Alternatively, you can call the <function>execute</function> method on
+ the plan object:
+<programlisting>
+rv = plan.execute(["name"], 5)
+</programlisting>
+ </para>
- <para>
- When you prepare a plan using the PL/Python module it is
- automatically saved. Read the SPI documentation (<xref
- linkend="spi">) for a description of what this means.
- In order to make effective use of this across function calls
- one needs to use one of the persistent storage dictionaries
- <literal>SD</literal> or <literal>GD</literal> (see
- <xref linkend="plpython-sharing">). For example:
+ <para>
+ Query parameters and result row fields are converted between PostgreSQL
+ and Python data types as described in <xref linkend="plpython-data">.
+ </para>
+
+ <para>
+ When you prepare a plan using the PL/Python module it is automatically
+ saved. Read the SPI documentation (<xref linkend="spi">) for a
+ description of what this means. In order to make effective use of this
+ across function calls one needs to use one of the persistent storage
+ dictionaries <literal>SD</literal> or <literal>GD</literal> (see
+ <xref linkend="plpython-sharing">). For example:
<programlisting>
CREATE FUNCTION usesavedplan() RETURNS trigger AS $$
- if SD.has_key("plan"):
+ if "plan" in SD:
plan = SD["plan"]
else:
plan = plpy.prepare("SELECT 1")
# rest of function
$$ LANGUAGE plpythonu;
</programlisting>
- </para>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>plpy.<function>cursor</function>(<replaceable>query</replaceable>)</literal></term>
+ <term><literal>plpy.<function>cursor</function>(<replaceable>plan</replaceable> [, <replaceable>arguments</replaceable>])</literal></term>
+ <listitem>
+ <para>
+ The <literal>plpy.cursor</literal> function accepts the same arguments
+ as <literal>plpy.execute</literal> (except for the row limit) and returns
+ a cursor object, which allows you to process large result sets in smaller
+ chunks. As with <literal>plpy.execute</literal>, either a query string
+ or a plan object along with a list of arguments can be used, or
+ the <function>cursor</function> function can be called as a method of
+ the plan object.
+ </para>
+
+ <para>
+ The cursor object provides a <literal>fetch</literal> method that accepts
+ an integer parameter and returns a result object. Each time you
+ call <literal>fetch</literal>, the returned object will contain the next
+ batch of rows, never larger than the parameter value. Once all rows are
+ exhausted, <literal>fetch</literal> starts returning an empty result
+ object. Cursor objects also provide an
+ <ulink url="http://docs.python.org/library/stdtypes.html#iterator-types">iterator
+ interface</ulink>, yielding one row at a time until all rows are
+ exhausted. Data fetched that way is not returned as result objects, but
+ rather as dictionaries, each dictionary corresponding to a single result
+ row.
+ </para>
+
+ <para>
+ An example of two ways of processing data from a large table is:
+<programlisting>
+CREATE FUNCTION count_odd_iterator() RETURNS integer AS $$
+odd = 0
+for row in plpy.cursor("select num from largetable"):
+ if row['num'] % 2:
+ odd += 1
+return odd
+$$ LANGUAGE plpythonu;
+
+CREATE FUNCTION count_odd_fetch(batch_size integer) RETURNS integer AS $$
+odd = 0
+cursor = plpy.cursor("select num from largetable")
+while True:
+ rows = cursor.fetch(batch_size)
+ if not rows:
+ break
+ for row in rows:
+ if row['num'] % 2:
+ odd += 1
+return odd
+$$ LANGUAGE plpythonu;
+
+CREATE FUNCTION count_odd_prepared() RETURNS integer AS $$
+odd = 0
+plan = plpy.prepare("select num from largetable where num % $1 <> 0", ["integer"])
+rows = list(plpy.cursor(plan, [2])) # or: = list(plan.cursor([2]))
+
+return len(rows)
+$$ LANGUAGE plpythonu;
+</programlisting>
+ </para>
+
+ <para>
+ Cursors are automatically disposed of. But if you want to explicitly
+ release all resources held by a cursor, use the <literal>close</literal>
+ method. Once closed, a cursor cannot be fetched from anymore.
+ </para>
+
+ <tip>
+ <para>
+ Do not confuse objects created by <literal>plpy.cursor</literal> with
+ DB-API cursors as defined by
+ the <ulink url="http://www.python.org/dev/peps/pep-0249/">Python
+ Database API specification</ulink>. They don't have anything in common
+ except for the name.
+ </para>
+ </tip>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<title>Utility Functions</title>
<para>
The <literal>plpy</literal> module also provides the functions
- <literal>plpy.debug(<replaceable>msg</>)</literal>,
- <literal>plpy.log(<replaceable>msg</>)</literal>,
- <literal>plpy.info(<replaceable>msg</>)</literal>,
- <literal>plpy.notice(<replaceable>msg</>)</literal>,
- <literal>plpy.warning(<replaceable>msg</>)</literal>,
- <literal>plpy.error(<replaceable>msg</>)</literal>, and
- <literal>plpy.fatal(<replaceable>msg</>)</literal>.<indexterm><primary>elog</><secondary>in PL/Python</></indexterm>
- <function>plpy.error</function> and
- <function>plpy.fatal</function> actually raise a Python exception
- which, if uncaught, propagates out to the calling query, causing
- the current transaction or subtransaction to be aborted.
- <literal>raise plpy.Error(<replaceable>msg</>)</literal> and
+ <simplelist>
+ <member><literal>plpy.debug(<replaceable>msg, **kwargs</>)</literal></member>
+ <member><literal>plpy.log(<replaceable>msg, **kwargs</>)</literal></member>
+ <member><literal>plpy.info(<replaceable>msg, **kwargs</>)</literal></member>
+ <member><literal>plpy.notice(<replaceable>msg, **kwargs</>)</literal></member>
+ <member><literal>plpy.warning(<replaceable>msg, **kwargs</>)</literal></member>
+ <member><literal>plpy.error(<replaceable>msg, **kwargs</>)</literal></member>
+ <member><literal>plpy.fatal(<replaceable>msg, **kwargs</>)</literal></member>
+ </simplelist>
+ <indexterm><primary>elog</><secondary>in PL/Python</></indexterm>
+ <function>plpy.error</function> and <function>plpy.fatal</function>
+ actually raise a Python exception which, if uncaught, propagates out to
+ the calling query, causing the current transaction or subtransaction to
+ be aborted. <literal>raise plpy.Error(<replaceable>msg</>)</literal> and
<literal>raise plpy.Fatal(<replaceable>msg</>)</literal> are
- equivalent to calling
- <function>plpy.error</function> and
- <function>plpy.fatal</function>, respectively.
- The other functions only generate messages of different
- priority levels.
+ equivalent to calling <literal>plpy.error(<replaceable>msg</>)</literal> and
+ <literal>plpy.fatal(<replaceable>msg</>)</literal>, respectively but
+ the <literal>raise</literal> form does not allow passing keyword arguments.
+ The other functions only generate messages of different priority levels.
Whether messages of a particular priority are reported to the client,
written to the server log, or both is controlled by the
<xref linkend="guc-log-min-messages"> and
variables. See <xref linkend="runtime-config"> for more information.
</para>
+ <para>
+ The <replaceable>msg</> argument is given as a positional argument. For
+ backward compatibility, more than one positional argument can be given. In
+ that case, the string representation of the tuple of positional arguments
+ becomes the message reported to the client.
+ </para>
+
+ <para>
+ The following keyword-only arguments are accepted:
+ <simplelist>
+ <member><literal>detail</literal></member>
+ <member><literal>hint</literal></member>
+ <member><literal>sqlstate</literal></member>
+ <member><literal>schema_name</literal></member>
+ <member><literal>table_name</literal></member>
+ <member><literal>column_name</literal></member>
+ <member><literal>datatype_name</literal></member>
+ <member><literal>constraint_name</literal></member>
+ </simplelist>
+ The string representation of the objects passed as keyword-only arguments
+ is used to enrich the messages reported to the client. For example:
+
+<programlisting>
+CREATE FUNCTION raise_custom_exception() RETURNS void AS $$
+plpy.error("custom exception message",
+ detail="some info about exception",
+ hint="hint for users")
+$$ LANGUAGE plpythonu;
+
+=# SELECT raise_custom_exception();
+ERROR: plpy.Error: custom exception message
+DETAIL: some info about exception
+HINT: hint for users
+CONTEXT: Traceback (most recent call last):
+ PL/Python function "raise_custom_exception", line 4, in <module>
+ hint="hint for users")
+PL/Python function "raise_custom_exception"
+</programlisting>
+ </para>
+
<para>
Another set of utility functions are
<literal>plpy.quote_literal(<replaceable>string</>)</literal>,
<listitem>
<para><envar>PYTHONUSERBASE</envar></para>
</listitem>
+
+ <listitem>
+ <para><envar>PYTHONHASHSEED</envar></para>
+ </listitem>
</itemizedlist>
(It appears to be a Python implementation detail beyond the control