]> granicus.if.org Git - postgresql/commitdiff
I've created a new section to func.sgml, "Binary String Functions and
authorBruce Momjian <bruce@momjian.us>
Tue, 20 Nov 2001 15:42:44 +0000 (15:42 +0000)
committerBruce Momjian <bruce@momjian.us>
Tue, 20 Nov 2001 15:42:44 +0000 (15:42 +0000)
Operators", plagiarized shamelessly from the "String Functions and
Operators" section. There were enough differences that it made sense (at
least to me) to give this its own section instead of cramming it in with
normal string functions. This way I could also make the examples
relevant, which is particularly important for bytea.

One thing I think worth mentioning: while documenting the trim()
function I realized that I never implemented the bytea equivalent of
rtrim and ltrim. Therefore, the 'leading' and 'trailing' forms of trim,
available with text, are not available with bytea (I'd be happy to
correct this, but since it would require an initdb, I guess not until
7.3) -- the submitted doc accurately reflects this.

I will look for other areas of the docs that need mention of bytea, but
any guidance would be much appreciated.

--

Here's a second bytea documentation patch. This one significantly
expands the "Binary Data" section added by Bruce recently.

Joe Conway

doc/src/sgml/datatype.sgml
doc/src/sgml/func.sgml

index 3c9c3ce5270c7a1639a0109af26d926c2a333b59..016993c0eb6f79bef69ba562d923212fa1976813 100644 (file)
@@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.71 2001/11/19 09:05:00 tgl Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v 1.72 2001/11/20 15:42:44 momjian Exp $
 -->
 
  <chapter id="datatype">
@@ -965,23 +965,296 @@ SELECT b, char_length(b) FROM test2;
 
   </sect1>
 
-  <sect1 id="datatype-binary">
-   <title>Binary Data</title>
+ <sect1 id="datatype-binary">
+  <title>Binary Strings</title>
+   <para>
+    The <type>bytea</type> data type allows storage of binary strings.
+   </para>
+
+   <table tocentry="1">
+    <title>Binary String Types</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Type Name</entry>
+       <entry>Storage</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>bytea</entry>
+       <entry>4 bytes plus the actual string</entry>
+       <entry>Variable (not specifically limited)
+              length binary string</entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    A binary string is a sequence of octets that does not have either a
+    character set or collation associated with it.  Bytea specifically
+    allows storage of NULLs and other 'non-printable' <acronym>ASCII
+    </acronym> characters.
+   </para>
 
    <para>
-    The <type>bytea</type> data type allows storage of binary data,
-    specifically allowing storage of NULLs which are entered as
-    <literal>'\\000'</>. The first backslash is interpreted by the
-    single quotes, and the second is recognized by <type>bytea</> and
-    precedes a three digit octal value. For a similar reason, a
-    backslash must be entered into a field as <literal>'\\\\'</> or
-    <literal>'\\134'</>. You may also have to escape line feeds and
-    carriage return if your interface automatically translates these. It
-    can store values of any length. <type>Bytea</> is a non-standard
-    data type.
+    Certain <acronym>ASCII</acronym> characters MUST be escaped (but all
+    characters MAY be escaped) when used as part of a string literal in an
+    <acronym>SQL</acronym> statement. In general, to escape a character, it
+    is converted into the three digit octal number equal to the decimal
+    <acronym>ASCII</acronym> value, and preceeded by two backslashes. The
+    single quote (') and backslash (\) characters have special alternate
+    escape sequences. Details are in
+    <xref linkend="datatype-binary-sqlesc">.
    </para>
+
+   <table id="datatype-binary-sqlesc">
+    <title><acronym>SQL</acronym> Literal Escaped <acronym>ASCII</acronym>
+           Characters</title>
+    <tgroup cols="5">
+     <thead>
+      <row>
+       <entry>Decimal <acronym>ASCII</acronym> Value</entry>
+       <entry>Description</entry>
+       <entry>Input Escaped Representation</entry>
+       <entry>Example</entry>
+       <entry>Printed Result</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry> <literal> 0 </literal> </entry>
+       <entry> null byte </entry>
+       <entry> <literal> '\\000' </literal> </entry>
+       <entry> <literal> select '\\000'::bytea; </literal> </entry>
+       <entry> <literal> \000 </literal></entry>
+      </row>
+
+      <row>
+       <entry> <literal> 39 </literal> </entry>
+       <entry> single quote </entry>
+       <entry> <literal> '\\'' or '\\047' </literal> </entry>
+       <entry> <literal> select '\''::bytea; </literal></entry>
+       <entry> <literal> ' </literal></entry>
+      </row>
+
+      <row>
+       <entry> <literal>92</literal> </entry>
+       <entry> backslash </entry>
+       <entry> <literal> '\\\\' or '\\134' </literal> </entry>
+       <entry> <literal> select '\\\\'::bytea; </literal></entry>
+       <entry> <literal> \\ </literal></entry>
+      </row>
+
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    Note that the result in each of the examples above was exactly one
+    byte in length, even though the output representation of the null byte
+    and backslash are more than one character. Bytea output characters
+    are also escaped. In general, each "non-printable" character is
+    converted into the three digit octal number equal to its decimal
+    <acronym>ASCII</acronym> value, and preceeded by one backslash. Most
+    "printable" characters are represented by their standard
+    <acronym>ASCII</acronym> representation. The backslash (\) character
+    has a special alternate output representation. Details are in
+    <xref linkend="datatype-binary-resesc">.
+   </para>
+
+   <table id="datatype-binary-resesc">
+    <title><acronym>SQL</acronym> Output Escaped <acronym>ASCII</acronym>
+           Characters</title>
+    <tgroup cols="5">
+     <thead>
+      <row>
+       <entry>Decimal <acronym>ASCII</acronym> Value</entry>
+       <entry>Description</entry>
+       <entry>Output Escaped Representation</entry>
+       <entry>Example</entry>
+       <entry>Printed Result</entry>
+      </row>
+     </thead>
+
+     <tbody>
+
+      <row>
+       <entry> <literal> 39 </literal> </entry>
+       <entry> single quote </entry>
+       <entry> <literal> ' </literal> </entry>
+       <entry> <literal> select '\\047'::bytea; </literal></entry>
+       <entry> <literal> ' </literal></entry>
+      </row>
+
+      <row>
+       <entry> <literal> 92 </literal> </entry>
+       <entry> backslash </entry>
+       <entry> <literal> \\ </literal> </entry>
+       <entry> <literal> select '\\134'::bytea; </literal></entry>
+       <entry> <literal> \\ </literal></entry>
+      </row>
+
+      <row>
+       <entry> <literal> 0 to 31 and 127 to 255 </literal> </entry>
+       <entry> non-printable characters </entry>
+       <entry> <literal> \### (octal value) </literal> </entry>
+       <entry> <literal> select '\\001'::bytea; </literal> </entry>
+       <entry> <literal> \001 </literal></entry>
+      </row>
+
+      <row>
+       <entry> <literal> 32 to 126 </literal> </entry>
+       <entry> printable characters </entry>
+       <entry> ASCII representation </entry>
+       <entry> <literal>  select '\\176'::bytea; </literal> </entry>
+       <entry> <literal> ~ </literal></entry>
+      </row>
+
+     </tbody>
+    </tgroup>
+   </table>
+
+   <para>
+    <acronym>SQL</acronym> string literals (input strings) must be
+    preceeded with two backslashes due to the fact that they must pass
+    through two parsers in the PostgreSQL backend. The first backslash
+    is interpreted as an escape character by the string literal parser,
+    and therefore is consumed, leaving the characters that follow it.
+    The second backslash is recognized by <type>bytea</> input function
+    as the prefix of a three digit octal value. For example, a string
+    literal passed to the backend as <literal>'\\001'</literal> becomes
+    <literal>'\001'</literal> after passing through the string literal
+    parser. The <literal>'\001'</literal> is then sent to the bytea
+    input function, where it is converted to a single byte with a decimal
+    <acronym>ASCII</acronym> value of 1.
+   </para>
+
+   <para>
+    For a similar reason, a backslash must be input as
+    <literal>'\\\\'</literal> (or <literal>'\\134'</literal>). The first
+    and third backslashes are interpreted as escape characters by the
+    string literal parser, and therefore are consumed, leaving the
+    second and forth backslashes untouched. The second and forth
+    backslashes are recognized by <type>bytea</> input function as a single
+    backslash. For example, a string literal passed to the backend as
+    <literal>'\\\\'</literal> becomes <literal>'\\'</literal> after passing
+    through the string literal parser. The <literal>'\\'</literal> is then
+    sent to the bytea input function, where it is converted to a single
+    byte with a decimal <acronym>ASCII</acronym> value of 92.
+   </para>
+
+   <para>
+    A single quote is a bit different in that it must be input as
+    <literal>'\''</literal> (or <literal>'\\134'</literal>), NOT as
+    <literal>'\\''</literal>. This is because, while the literal parser
+    interprets the single quote as a special character, and will consume
+    the single backslash, the bytea input function does NOT recognize
+    a single quote as a special character. Therefore a string
+    literal passed to the backend as <literal>'\''</literal> becomes
+    <literal>'''</literal> after passing through the string literal
+    parser. The <literal>'''</literal> is then sent to the bytea
+    input function, where it is retains its single byte decimal
+    <acronym>ASCII</acronym> value of 39.
+   </para>
+
+   <para>
+    Depending on the front end to PostgreSQL you use, you may have
+    additional work to do in terms of escaping and unescaping bytea
+    strings. For example, you may also have to escape line feeds and
+    carriage return if your interface automatically translates these.
+    Or you may have to double up on backslashes if the parser for your
+    language or choice also treats them as an escape character.
+   </para>
+
+   <sect2 id="datatype-binary-compat">
+    <title>Compatibility</title>
+    <para>
+     Bytea provides most of the functionality of the SQL99 binary string
+     type per SQL99 section 4.3. A comparison of PostgreSQL bytea and SQL99
+     Binary Strings is presented in
+     <xref linkend="datatype-binary-compat-comp">.
+    </para>
+
+    <table id="datatype-binary-compat-comp">
+     <title>Comparison of SQL99 Binary String and BYTEA types</title>
+     <tgroup cols="2">
+      <thead>
+       <row>
+        <entry>SQL99</entry>
+        <entry>BYTEA</entry>
+       </row>
+      </thead>
+
+      <tbody>
+       <row>
+        <entry> Name of data type BINARY LARGE OBJECT or BLOB </entry>
+        <entry> Name of data type BYTEA </entry>
+       </row>
+
+       <row>
+        <entry> Sequence of octets that does not have either a character set
+                or collation associated with it. </entry>
+        <entry> same </entry>
+       </row>
+
+       <row>
+        <entry> Described by a binary data type descriptor containing the
+                name of the data type and the maximum length
+                in octets</entry>
+        <entry> Described by a binary data type descriptor containing the
+                name of the data type with no specific maximum length
+                </entry>
+       </row>
+
+       <row>
+        <entry> All binary strings are mutually comparable in accordance
+                with the rules of comparison predicates.</entry>
+        <entry> same</entry>
+       </row>
+
+       <row>
+        <entry> Binary string values can only be compared for equality.
+                </entry>
+        <entry> Binary string values can be compared for equality, greater
+                than, greater than or equal, less than, less than or equal
+                </entry>
+       </row>
+
+       <row>
+        <entry> Operators operating on and returning binary strings
+                include concatenation, substring, overlay, and trim</entry>
+        <entry> Operators operating on and returning binary strings
+                include concatenation, substring, and trim. The <literal>
+                'leading'</literal> and <literal>'trailing'</literal>
+                arguments for trim are not yet implemented.
+                </entry>
+       </row>
+
+       <row>
+        <entry> Other operators involving binary strings
+                include length, position, and the like predicate</entry>
+        <entry> same</entry>
+       </row>
+
+       <row>
+        <entry> A binary string literal is comprised of an even number of
+                hexidecimal digits, in single quotes, preceeded by "X",
+                e.g. X'1a43fe'</entry>
+        <entry> A binary string literal is comprised of ASCII characters
+                escaped according to the rules shown in 
+                <xref linkend="datatype-binary-sqlesc"> </entry>
+       </row>
+      </tbody>
+     </tgroup>
+    </table>
+   </sect2>
   </sect1>
 
+
   <sect1 id="datatype-datetime">
    <title>Date/Time Types</title>
 
index e9d9b47aeb8f41a26c0a951c0dce1c24e610435b..8af9fd06796bedf19e9d033d3a7e9c9ab19a3622 100644 (file)
@@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/func.sgml,v 1.82 2001/11/19 09:05:01 tgl Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/func.sgml,v 1.83 2001/11/20 15:42:44 momjian Exp $
 Postgres documentation
 -->
 
@@ -1133,7 +1133,7 @@ Postgres documentation
       <entry><type>text</type></entry>
       <entry>
        Encodes binary data to <acronym>ASCII</acronym>-only representation.  Supported
-       types are: 'base64', 'hex'.
+       types are: 'base64', 'hex', 'escape'.
       </entry>
       <entry><literal>encode('123\\000\\001', 'base64')</literal></entry>
       <entry><literal>MTIzAAE=</literal></entry>
@@ -1164,6 +1164,186 @@ Postgres documentation
  </sect1>
 
 
+ <sect1 id="functions-binarystring">
+  <title>Binary String Functions and Operators</title>
+
+  <para>
+   This section describes functions and operators for examining and
+   manipulating binary string values.  Strings in this context include
+   values of the type <type>BYTEA</type>.
+  </para>
+
+  <para>
+   <acronym>SQL</acronym> defines some string functions with a special syntax where
+   certain keywords rather than commas are used to separate the
+   arguments.  Details are in <xref linkend="functions-binarystring-sql">.
+   Some functions are also implemented using the regular syntax for
+   function invocation.  (See <xref linkend="functions-binarystring-other">.)
+  </para>
+
+  <table id="functions-binarystring-sql">
+   <title><acronym>SQL</acronym> Binary String Functions and Operators</title>
+   <tgroup cols="5">
+    <thead>
+     <row>
+      <entry>Function</entry>
+      <entry>Return Type</entry>
+      <entry>Description</entry>
+      <entry>Example</entry>
+      <entry>Result</entry>  
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry> <parameter>string</parameter> <literal>||</literal> <parameter>string</parameter> </entry>
+      <entry> <type>bytea</type> </entry>
+      <entry>
+       string concatenation
+       <indexterm>
+        <primary>binary strings</primary>
+       <secondary>concatenation</secondary>
+       </indexterm>
+      </entry>
+      <entry><literal>'\\\\Postgre'::bytea || '\\047SQL\\000'::bytea</></entry>
+      <entry><literal>\\Postgre'SQL\000</></entry>
+     </row>
+
+     <row>
+      <entry><function>octet_length</function>(<parameter>string</parameter>)</entry>
+      <entry><type>integer</type></entry>
+      <entry>number of bytes in binary string</entry>
+      <entry><literal>octet_length('jo\\000se'::bytea)</literal></entry>
+      <entry><literal>5</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>position</function>(<parameter>substring</parameter> in <parameter>string</parameter>)</entry>
+      <entry><type>integer</type></entry>
+      <entry>location of specified substring</entry>
+      <entry><literal>position('\\000om'::bytea in 'Th\\000omas'::bytea)</literal></entry>
+      <entry><literal>3</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>substring</function>(<parameter>string</parameter> <optional>from <type>integer</type></optional> <optional>for <type>integer</type></optional>)</entry>
+      <entry><type>bytea</type></entry>
+      <entry>
+       extract substring
+       <indexterm>
+        <primary>substring</primary>
+       </indexterm>
+      </entry>
+      <entry><literal>substring('Th\\000omas'::bytea from 2 for 3)</literal></entry>
+      <entry><literal>h\000o</literal></entry>
+     </row>
+
+     <row>
+      <entry>
+       <function>trim</function>(<optional>both</optional>
+       <parameter>characters</parameter> from
+       <parameter>string</parameter>)
+      </entry>
+      <entry><type>bytea</type></entry>
+      <entry>
+       Removes the longest string containing only the
+       <parameter>characters</parameter> from the
+       beginning/end/both ends of the <parameter>string</parameter>.
+      </entry>
+      <entry><literal>trim('\\000'::bytea from '\\000Tom\\000'::bytea)</literal></entry>
+      <entry><literal>Tom</literal></entry>
+     </row>
+
+    </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   Additional binary string manipulation functions are available and are
+   listed below.  Some of them are used internally to implement the
+   <acronym>SQL</acronym>-standard string functions listed above.
+  </para>
+
+  <table id="functions-binarystring-other">
+   <title>Other Binary String Functions</title>
+   <tgroup cols="5">
+    <thead>
+     <row>
+      <entry>Function</entry>
+      <entry>Return Type</entry>
+      <entry>Description</entry>
+      <entry>Example</entry>
+      <entry>Result</entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry><function>btrim</function>(<parameter>string</parameter> <type>bytea</type>, <parameter>trim</parameter> <type>bytea</type>)</entry>
+      <entry><type>bytea</type></entry>
+      <entry>
+       Remove (trim) the longest string consisting only of characters
+       in <parameter>trim</parameter> from the start and end of
+       <parameter>string</parameter>.
+      </entry>
+      <entry><literal>btrim('\\000trim\\000'::bytea,'\\000'::bytea)</literal></entry>
+      <entry><literal>trim</literal></entry>
+     </row>
+
+     <row>
+      <entry><function>length</function>(<parameter>string</parameter>)</entry>
+      <entry><type>integer</type></entry>
+      <entry>
+       length of binary string
+       <indexterm>
+        <primary>binary strings</primary>
+       <secondary>length</secondary>
+       </indexterm>
+       <indexterm>
+        <primary>length</primary>
+       <secondary>binary strings</secondary>
+       <see>binary strings, length</see>
+       </indexterm>
+      </entry>
+      <entry><literal>length('jo\\000se'::bytea)</></entry>
+      <entry><literal>5</></entry>
+     </row>
+
+     <row>
+      <entry>
+       <function>encode</function>(<parameter>string</parameter> <type>bytea</type>,
+              <parameter>type</parameter> <type>text</type>)
+      </entry>
+      <entry><type>text</type></entry>
+      <entry>
+       Encodes binary string to <acronym>ASCII</acronym>-only representation.  Supported
+       types are: 'base64', 'hex', 'escape'.
+      </entry>
+      <entry><literal>encode('123\\000456'::bytea, 'escape')</literal></entry>
+      <entry><literal>123\000456</literal></entry>
+     </row>       
+
+     <row>
+      <entry>
+       <function>decode</function>(<parameter>string</parameter> <type>text</type>,
+              <parameter>type</parameter> <type>text</type>)
+      </entry>
+      <entry><type>bytea</type></entry>
+      <entry>
+       Decodes binary string from <parameter>string</parameter> previously 
+       encoded with encode().  Parameter type is same as in encode().
+      </entry>
+      <entry><literal>decode('123\\000456', 'escape')</literal></entry>
+      <entry><literal>123\000456</literal></entry>
+     </row>       
+
+    </tbody>
+   </tgroup>
+  </table>
+
+ </sect1>
+
+
  <sect1 id="functions-matching">
   <title>Pattern Matching</title>