From: Bruce Momjian Date: Mon, 27 Apr 2009 16:27:36 +0000 (+0000) Subject: Proofreading adjustments for first two parts of documentation (Tutorial X-Git-Tag: REL8_4_BETA2~68 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=ba36c48e39747678412d48bcbf6ed14cb2dc8ddf;p=postgresql Proofreading adjustments for first two parts of documentation (Tutorial and SQL). --- diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 340f7e5802..305ce9cc57 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -1,4 +1,4 @@ - + Advanced Features @@ -19,10 +19,10 @@ This chapter will on occasion refer to examples found in to change or improve them, so it will be - of advantage if you have read that chapter. Some examples from + good if you have read that chapter. Some examples from this chapter can also be found in advanced.sql in the tutorial directory. This - file also contains some example data to load, which is not + file also contains some sample data to load, which is not repeated here. (Refer to for how to use the file.) @@ -173,7 +173,7 @@ UPDATE branches SET balance = balance + 100.00 - The details of these commands are not important here; the important + The details of these commands are not important; the important point is that there are several separate updates involved to accomplish this rather simple operation. Our bank's officers will want to be assured that either all these updates happen, or none of them happen. @@ -307,7 +307,7 @@ COMMIT; This example is, of course, oversimplified, but there's a lot of control - to be had over a transaction block through the use of savepoints. + possible in a transaction block through the use of savepoints. Moreover, ROLLBACK TO is the only way to regain control of a transaction block that was put in aborted state by the system due to an error, short of rolling it back completely and starting diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml index 84575ea52e..54386ca264 100644 --- a/doc/src/sgml/arch-dev.sgml +++ b/doc/src/sgml/arch-dev.sgml @@ -1,4 +1,4 @@ - + Overview of PostgreSQL Internals @@ -67,7 +67,7 @@ One application of the rewrite system is in the realization of views. Whenever a query against a view - (i.e. a virtual table) is made, + (i.e., a virtual table) is made, the rewrite system rewrites the user's query to a query that accesses the base tables given in the view definition instead. @@ -145,7 +145,7 @@ Once a connection is established the client process can send a query to the backend (server). The query is transmitted using plain text, - i.e. there is no parsing done in the frontend (client). The + i.e., there is no parsing done in the frontend (client). The server parses the query, creates an execution plan, executes the plan and returns the retrieved rows to the client by transmitting them over the established connection. @@ -442,7 +442,7 @@ relations, a near-exhaustive search is conducted to find the best join sequence. The planner preferentially considers joins between any two relations for which there exist a corresponding join clause in the - WHERE qualification (i.e. for + WHERE qualification (i.e., for which a restriction like where rel1.attr1=rel2.attr2 exists). Join pairs with no join clause are considered only when there is no other choice, that is, a particular relation has no available diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml index 08a3ee021d..6e731e1448 100644 --- a/doc/src/sgml/array.sgml +++ b/doc/src/sgml/array.sgml @@ -1,4 +1,4 @@ - + Arrays @@ -54,23 +54,24 @@ CREATE TABLE tictactoe ( ); - However, the current implementation does not enforce the array size - limits — the behavior is the same as for arrays of unspecified + However, the current implementation ignores any supplied array size + limits, i.e., the behavior is the same as for arrays of unspecified length. - Actually, the current implementation does not enforce the declared + In addition, the current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number - of dimensions. So, declaring number of dimensions or sizes in + of dimensions. So, declaring the number of dimensions or sizes in CREATE TABLE is simply documentation, it does not affect run-time behavior. - An alternative syntax, which conforms to the SQL standard, can - be used for one-dimensional arrays. + An alternative syntax, which conforms to the SQL standard by using + they keyword ARRAY, can + be used for one-dimensional arrays; pay_by_quarter could have been defined as: @@ -107,9 +108,9 @@ CREATE TABLE tictactoe ( where delim is the delimiter character for the type, as recorded in its pg_type entry. Among the standard data types provided in the - PostgreSQL distribution, type - box uses a semicolon (;) but all the others - use comma (,). Each val is + PostgreSQL distribution, all use a comma + (,), except for the type box which uses a semicolon + (;). Each val is either a constant of the array element type, or a subarray. An example of an array constant is: @@ -120,7 +121,7 @@ CREATE TABLE tictactoe ( - To set an element of an array constant to NULL, write NULL + To set an element of an array to NULL, write NULL for the element value. (Any upper- or lower-case variant of NULL will do.) If you want an actual string value NULL, you must put double quotes around it. @@ -163,6 +164,19 @@ SELECT * FROM sal_emp; + + Multidimensional arrays must have matching extents for each + dimension. A mismatch causes an error, for example: + + +INSERT INTO sal_emp + VALUES ('Bill', + '{10000, 10000, 10000, 10000}', + '{{"meeting", "lunch"}, {"meeting"}}'); +ERROR: multidimensional arrays must have array expressions with matching dimensions + + + The ARRAY constructor syntax can also be used: @@ -182,19 +196,6 @@ INSERT INTO sal_emp constructor syntax is discussed in more detail in . - - - Multidimensional arrays must have matching extents for each - dimension. A mismatch causes an error report, for example: - - -INSERT INTO sal_emp - VALUES ('Bill', - '{10000, 10000, 10000, 10000}', - '{{"meeting", "lunch"}, {"meeting"}}'); -ERROR: multidimensional arrays must have array expressions with matching dimensions - - @@ -207,7 +208,7 @@ ERROR: multidimensional arrays must have array expressions with matching dimens Now, we can run some queries on the table. - First, we show how to access a single element of an array at a time. + First, we show how to access a single element of an array. This query retrieves the names of the employees whose pay changed in the second quarter: @@ -221,7 +222,7 @@ SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2]; The array subscript numbers are written within square brackets. - By default PostgreSQL uses the + By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends with array[n]. @@ -257,7 +258,7 @@ SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill'; (1 row) - If any dimension is written as a slice, i.e. contains a colon, then all + If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimension that has only a single number (no colon) is treated as being from 1 to the number specified. For example, [2] is treated as @@ -288,13 +289,14 @@ SELECT schedule[1:2][2] FROM sal_emp WHERE name = 'Bill'; An array slice expression likewise yields null if the array itself or - any of the subscript expressions are null. However, in other corner + any of the subscript expressions are null. However, in other cases such as selecting an array slice that is completely outside the current array bounds, a slice expression yields an empty (zero-dimensional) array instead of null. (This does not match non-slice behavior and is done for historical reasons.) If the requested slice partially overlaps the array bounds, then it - is silently reduced to just the overlapping region. + is silently reduced to just the overlapping region instead of + returning null. @@ -311,7 +313,7 @@ SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol'; array_dims produces a text result, - which is convenient for people to read but perhaps not so convenient + which is convenient for people to read but perhaps inconvenient for programs. Dimensions can also be retrieved with array_upper and array_lower, which return the upper and lower bound of a @@ -380,12 +382,12 @@ UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}' - A stored array value can be enlarged by assigning to element(s) not already + A stored array value can be enlarged by assigning to elements not already present. Any positions between those previously present and the newly - assigned element(s) will be filled with nulls. For example, if array + assigned elements will be filled with nulls. For example, if array myarray currently has 4 elements, it will have six - elements after an update that assigns to myarray[6], - and myarray[5] will contain a null. + elements after an update that assigns to myarray[6]; + myarray[5] will contain null. Currently, enlargement in this fashion is only allowed for one-dimensional arrays, not multidimensional arrays. @@ -393,11 +395,11 @@ UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}' Subscripted assignment allows creation of arrays that do not use one-based subscripts. For example one might assign to myarray[-2:7] to - create an array with subscript values running from -2 to 7. + create an array with subscript values from -2 to 7. - New array values can also be constructed by using the concatenation operator, + New array values can also be constructed using the concatenation operator, ||: SELECT ARRAY[1,2] || ARRAY[3,4]; @@ -415,14 +417,14 @@ SELECT ARRAY[5,6] || ARRAY[[1,2],[3,4]]; - The concatenation operator allows a single element to be pushed on to the + The concatenation operator allows a single element to be pushed to the beginning or end of a one-dimensional array. It also accepts two N-dimensional arrays, or an N-dimensional and an N+1-dimensional array. - When a single element is pushed on to either the beginning or end of a + When a single element is pushed to either the beginning or end of a one-dimensional array, the result is an array with the same lower bound subscript as the array operand. For example: @@ -461,7 +463,7 @@ SELECT array_dims(ARRAY[[1,2],[3,4]] || ARRAY[[5,6],[7,8],[9,0]]); - When an N-dimensional array is pushed on to the beginning + When an N-dimensional array is pushed to the beginning or end of an N+1-dimensional array, the result is analogous to the element-array case above. Each N-dimensional sub-array is essentially an element of the N+1-dimensional @@ -482,7 +484,7 @@ SELECT array_dims(ARRAY[1,2] || ARRAY[[3,4],[5,6]]); arrays, but array_cat supports multidimensional arrays. Note that the concatenation operator discussed above is preferred over - direct use of these functions. In fact, the functions exist primarily for use + direct use of these functions. In fact, these functions primarily exist for use in implementing the concatenation operator. However, they might be directly useful in the creation of user-defined aggregates. Some examples: @@ -528,8 +530,8 @@ SELECT array_cat(ARRAY[5,6], ARRAY[[1,2],[3,4]]); - To search for a value in an array, you must check each value of the - array. This can be done by hand, if you know the size of the array. + To search for a value in an array, each value must be checked. + This can be done manually, if you know the size of the array. For example: @@ -540,7 +542,7 @@ SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR However, this quickly becomes tedious for large arrays, and is not - helpful if the size of the array is uncertain. An alternative method is + helpful if the size of the array is unknown. An alternative method is described in . The above query could be replaced by: @@ -548,7 +550,7 @@ SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter); - In addition, you could find rows where the array had all values + In addition, you can find rows where the array has all values equal to 10000 with: @@ -578,7 +580,7 @@ SELECT * FROM can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to - scale up better to large numbers of elements. + scale better for a large number of elements. @@ -600,9 +602,9 @@ SELECT * FROM The delimiter character is usually a comma (,) but can be something else: it is determined by the typdelim setting for the array's element type. (Among the standard data types provided - in the PostgreSQL distribution, type - box uses a semicolon (;) but all the others - use comma.) In a multidimensional array, each dimension (row, plane, + in the PostgreSQL distribution, all + use a comma, except for box, which uses a semicolon (;).) + In a multidimensional array, each dimension (row, plane, cube, etc.) gets its own level of curly braces, and delimiters must be written between adjacent curly-braced entities of the same level. @@ -614,7 +616,7 @@ SELECT * FROM NULL. Double quotes and backslashes embedded in element values will be backslash-escaped. For numeric data types it is safe to assume that double quotes will never appear, but - for textual data types one should be prepared to cope with either presence + for textual data types one should be prepared to cope with either the presence or absence of quotes. @@ -647,27 +649,27 @@ SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2 or backslashes disables this and allows the literal string value NULL to be entered. Also, for backwards compatibility with pre-8.2 versions of PostgreSQL, the configuration parameter might be turned + linkend="guc-array-nulls"> configuration parameter can be turned off to suppress recognition of NULL as a NULL. - As shown previously, when writing an array value you can write double + As shown previously, when writing an array value you can use double quotes around any individual array element. You must do so if the element value would otherwise confuse the array-value parser. - For example, elements containing curly braces, commas (or whatever the - delimiter character is), double quotes, backslashes, or leading or trailing + For example, elements containing curly braces, commas (or the matching + delimiter character), double quotes, backslashes, or leading or trailing whitespace must be double-quoted. Empty strings and strings matching the word NULL must be quoted, too. To put a double quote or backslash in a quoted array element value, use escape string syntax - and precede it with a backslash. Alternatively, you can use + and precede it with a backslash. Alternatively, you can avoid quotes and use backslash-escaping to protect all data characters that would otherwise be taken as array syntax. - You can write whitespace before a left brace or after a right - brace. You can also write whitespace before or after any individual item + You can use whitespace before a left brace or after a right + brace. You can also add whitespace before or after any individual item string. In all of these cases the whitespace will be ignored. However, whitespace within double-quoted elements, or surrounded on both sides by non-whitespace characters of an element, is not ignored. diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index ae9563a8e3..016050664f 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1,4 +1,4 @@ - + Backup and Restore @@ -1523,7 +1523,7 @@ archive_command = 'local_backup_script.sh' - It should be noted that the log shipping is asynchronous, i.e. the WAL + It should be noted that the log shipping is asynchronous, i.e., the WAL records are shipped after transaction commit. As a result there is a window for data loss should the primary server suffer a catastrophic failure: transactions not yet shipped will be lost. The length of the diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6102c70f47..471a230433 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1,4 +1,4 @@ - + Server Configuration @@ -1253,7 +1253,7 @@ SET ENABLE_SEQSCAN TO OFF; function, which some operating systems lack. If the function is not present then setting this parameter to anything but zero will result in an error. On some operating systems the function is present but - does not actually do anything (e.g. Solaris). + does not actually do anything (e.g., Solaris). @@ -4333,7 +4333,7 @@ SET XML OPTION { DOCUMENT | CONTENT }; If a dynamically loadable module needs to be opened and the file name specified in the CREATE FUNCTION or LOAD command - does not have a directory component (i.e. the + does not have a directory component (i.e., the name does not contain a slash), the system will search this path for the required file. @@ -4503,7 +4503,7 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir' The shared lock table is created to track locks on max_locks_per_transaction * ( + ) objects (e.g. tables); + linkend="guc-max-prepared-transactions">) objects (e.g., tables); hence, no more than this many distinct objects can be locked at any one time. This parameter controls the average number of object locks allocated for each transaction; individual transactions diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml index 9790eb60dd..0ef92b4896 100644 --- a/doc/src/sgml/contrib.sgml +++ b/doc/src/sgml/contrib.sgml @@ -1,4 +1,4 @@ - + Additional Supplied Modules @@ -16,7 +16,7 @@ When building from the source distribution, these modules are not built - automatically. You can build and install all of them by running + automatically. You can build and install all of them by running: gmake gmake install @@ -25,7 +25,7 @@ or to build and install just one selected module, do the same in that module's subdirectory. Many of the modules have regression tests, which can be executed by - running + running: gmake installcheck diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index fe7f95f086..b2c3e7f395 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -1,4 +1,4 @@ - + Data Types @@ -25,7 +25,7 @@ Aliases column are the names used internally by PostgreSQL for historical reasons. In addition, some internally used or deprecated types are available, - but they are not listed here. + but are not listed here. @@ -73,7 +73,7 @@ box - rectangular box in the plane + rectangular box on a plane @@ -103,7 +103,7 @@ circle - circle in the plane + circle on a plane @@ -115,7 +115,7 @@ double precision float8 - double precision floating-point number + double precision floating-point number (8 bytes) @@ -139,19 +139,19 @@ line - infinite line in the plane + infinite line on a plane lseg - line segment in the plane + line segment on a plane macaddr - MAC address + MAC (Media Access Control) address @@ -171,25 +171,25 @@ path - geometric path in the plane + geometric path on a plane point - geometric point in the plane + geometric point on a plane polygon - closed geometric path in the plane + closed geometric path on a plane real float4 - single precision floating-point number + single precision floating-point number (4 bytes) @@ -213,7 +213,7 @@ time [ (p) ] [ without time zone ] - time of day + time of day (no time zone) @@ -225,7 +225,7 @@ timestamp [ (p) ] [ without time zone ] - date and time + date and time (no time zone) @@ -288,9 +288,9 @@ and output functions. Many of the built-in types have obvious external formats. However, several types are either unique to PostgreSQL, such as geometric - paths, or have several possibilities for formats, such as the date + paths, or have several possible formats, such as the date and time types. - Some of the input and output functions are not invertible. That is, + Some of the input and output functions are not invertible, i.e. the result of an output function might lose accuracy when compared to the original input. @@ -332,7 +332,7 @@ integer 4 bytes - usual choice for integer + typical choice for integer -2147483648 to +2147483647 @@ -431,21 +431,21 @@ - The type integer is the usual choice, as it offers + The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The smallint type is generally only used if disk space is at a premium. The bigint type should only - be used if the integer range is not sufficient, + be used if the integer range is insufficient, because the latter is definitely faster. - The bigint type might not function correctly on all - platforms, since it relies on compiler support for eight-byte - integers. On a machine without such support, bigint + On very minimal operating systems the bigint type + might not function correctly because it relies on compiler support + for eight-byte integers. On such machines, bigint acts the same as integer (but still takes up eight - bytes of storage). However, we are not aware of any reasonable - platform where this is actually the case. + bytes of storage). (We are not aware of any + platform where this is true.) @@ -453,7 +453,7 @@ integer (or int), smallint, and bigint. The type names int2, int4, and - int8 are extensions, which are shared with various + int8 are extensions, which are also used by other SQL database systems. @@ -481,11 +481,11 @@ especially recommended for storing monetary amounts and other quantities where exactness is required. However, arithmetic on numeric values is very slow compared to the integer - types, or to the floating-point types described in the next section. + and floating-point types described in the next section. - In what follows we use these terms: The + We use the following terms below: The scale of a numeric is the count of decimal digits in the fractional part, to the right of the decimal point. The precision of a @@ -558,7 +558,7 @@ NUMERIC type allows the special value NaN, meaning not-a-number. Any operation on NaN yields another NaN. When writing this value - as a constant in a SQL command, you must put quotes around it, + as a constant in an SQL command, you must put quotes around it, for example UPDATE table SET x = 'NaN'. On input, the string NaN is recognized in a case-insensitive manner. @@ -621,10 +621,10 @@ NUMERIC Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that storing - and printing back out a value might show slight discrepancies. + and retrieving a value might show slight discrepancies. Managing these errors and how they propagate through calculations is the subject of an entire branch of mathematics and computer - science and will not be discussed further here, except for the + science and will not be discussed here, except for the following points: @@ -645,8 +645,8 @@ NUMERIC - Comparing two floating-point values for equality might or might - not work as expected. + Comparing two floating-point values for equality might not + always work as expected. @@ -702,7 +702,7 @@ NUMERIC notations float and float(p) for specifying inexact numeric types. Here, p specifies - the minimum acceptable precision in binary digits. + the minimum acceptable precision in binary digits. PostgreSQL accepts float(1) to float(24) as selecting the real type, while @@ -717,7 +717,7 @@ NUMERIC Prior to PostgreSQL 7.4, the precision in float(p) was taken to mean - so many decimal digits. This has been corrected to match the SQL + so many decimal digits. This has been corrected to match the SQL standard, which specifies that the precision is measured in binary digits. The assumption that real and double precision have exactly 24 and 53 bits in the @@ -762,7 +762,7 @@ NUMERIC The data types serial and bigserial are not true types, but merely - a notational convenience for setting up unique identifier columns + a notational convenience for creating unique identifier columns (similar to the AUTO_INCREMENT property supported by some other databases). In the current implementation, specifying: @@ -786,7 +786,7 @@ ALTER SEQUENCE tablename_NOT NULL constraint is applied to ensure that a null value cannot be explicitly - inserted, either. (In most cases you would also want to attach a + inserted. (In most cases you would also want to attach a UNIQUE or PRIMARY KEY constraint to prevent duplicate values from being inserted by accident, but this is not automatic.) Lastly, the sequence is marked as owned by @@ -797,8 +797,8 @@ ALTER SEQUENCE tablename_ Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If - you wish a serial column to be in a unique constraint or a - primary key, it must now be specified, same as with + you wish a serial column to have a unique constraint or be a + primary key, it must now be specified just like any other data type. @@ -815,7 +815,7 @@ ALTER SEQUENCE tablename_ The type names serial and serial4 are equivalent: both create integer columns. The type - names bigserial and serial8 work just + names bigserial and serial8 work the same way, except that they create a bigint column. bigserial should be used if you anticipate the use of more than 231 identifiers over the @@ -837,9 +837,10 @@ ALTER SEQUENCE tablename_ The money type stores a currency amount with a fixed fractional precision; see . + linkend="datatype-money-table">. The fractional precision + is controlled by the database locale. Input is accepted in a variety of formats, including integer and - floating-point literals, as well as typical + floating-point literals, as well as typical currency formatting, such as '$1,000.00'. Output is generally in the latter form but depends on the locale. Non-quoted numeric values can be converted to money by @@ -859,10 +860,10 @@ SELECT regexp_replace('52093.89'::money::text, '[$,]', '', 'g')::numeric; - Since the output of this data type is locale-sensitive, it may not + Since the output of this data type is locale-sensitive, it might not work to load money data into a database that has a different setting of lc_monetary. To avoid problems, before - restoring a dump make sure lc_monetary has the same or + restoring a dump into a new database make sure lc_monetary has the same or equivalent value as in the database that was dumped. @@ -960,7 +961,7 @@ SELECT regexp_replace('52093.89'::money::text, '[$,]', '', 'g')::numeric; character varying(n) and character(n), where n is a positive integer. Both of these types can store strings up to - n characters in length. An attempt to store a + n characters in length (not bytes). An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length. (This somewhat @@ -1015,16 +1016,16 @@ SELECT regexp_replace('52093.89'::money::text, '[$,]', '', 'g')::numeric; The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of - character. Longer strings have 4 bytes overhead instead + character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data - type declaration is less than that. It wouldn't be very useful to + type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of - characters and bytes can be quite different anyway. If you desire to + characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.) @@ -1032,12 +1033,12 @@ SELECT regexp_replace('52093.89'::money::text, '[$,]', '', 'g')::numeric; - There are no performance differences between these three types, - apart from increased storage size when using the blank-padded - type, and a few extra cycles to check the length when storing into + There is no performance difference between these three types, + apart from increased storage space when using the blank-padded + type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance - advantages in some other database systems, it has no such advantages in + advantages in some other database systems, there is no such advantage in PostgreSQL. In most situations text or character varying should be used instead. @@ -1095,16 +1096,17 @@ SELECT b, char_length(b) FROM test2; There are two other fixed-length character types in PostgreSQL, shown in . The name - type exists only for storage of identifiers + type exists only for the storage of identifiers in the internal system catalogs and is not intended for use by the general user. Its length is currently defined as 64 bytes (63 usable characters plus terminator) but should be referenced using the constant - NAMEDATALEN. The length is set at compile time (and + NAMEDATALEN in C source code. + The length is set at compile time (and is therefore adjustable for special uses); the default maximum length might change in a future release. The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system - catalogs as a poor-man's enumeration type. + catalogs as a simplistic enumeration type.
@@ -1172,8 +1174,8 @@ SELECT b, char_length(b) FROM test2; A binary string is a sequence of octets (or bytes). Binary - strings are distinguished from character strings by two - characteristics: First, binary strings specifically allow storing + strings are distinguished from character strings in two + ways: First, binary strings specifically allow storing octets of value zero and other non-printable octets (usually, octets outside the range 32 to 126). Character strings disallow zero octets, and also disallow any @@ -1191,8 +1193,8 @@ SELECT b, char_length(b) FROM test2; values must be escaped (but all octet values can be escaped) when used as part of a string literal in an SQL statement. In - general, to escape an octet, it is converted into the three-digit - octal number equivalent of its decimal octet value, and preceded + general, to escape an octet, convert it into its three-digit + octal value and precede it by two backslashes. shows the characters that must be escaped, and gives the alternative escape sequences where applicable. @@ -1249,16 +1251,16 @@ SELECT b, char_length(b) FROM test2;
- The requirement to escape non-printable octets actually + The requirement to escape non-printable octets varies depending on locale settings. In some instances you can get away with leaving them unescaped. Note that the result in each of the examples in was exactly one octet in - length, even though the output representation of the zero octet and - backslash are more than one character. + length, even though the output representation is sometimes + more than one character. - The reason that you have to write so many backslashes, as shown + The reason multiple backslashes are required, as shown in , is that an input string written as a string literal must pass through two parse phases in the PostgreSQL server. @@ -1280,12 +1282,12 @@ SELECT b, char_length(b) FROM test2; - Bytea octets are also escaped in the output. In general, each + Bytea octets are sometimes escaped when output. In general, each non-printable octet is converted into its equivalent three-digit octal value and preceded by one backslash. Most printable octets are represented by their standard representation in the client character set. The octet with decimal - value 92 (backslash) has a special alternative output representation. + value 92 (backslash) is doubled in the output. Details are in . @@ -1406,7 +1408,7 @@ SELECT b, char_length(b) FROM test2; timestamp [ (p) ] [ without time zone ] 8 bytes - both date and time + both date and time (no time zone) 4713 BC 294276 AD 1 microsecond / 14 digits @@ -1422,7 +1424,7 @@ SELECT b, char_length(b) FROM test2; date 4 bytes - dates only + date (no time of day) 4713 BC 5874897 AD 1 day @@ -1430,7 +1432,7 @@ SELECT b, char_length(b) FROM test2; time [ (p) ] [ without time zone ] 8 bytes - times of day only + time of day (no date) 00:00:00 24:00:00 1 microsecond / 14 digits @@ -1446,7 +1448,7 @@ SELECT b, char_length(b) FROM test2; interval [ fields ] [ (p) ] 12 bytes - time intervals + time interval -178000000 years 178000000 years 1 microsecond / 14 digits @@ -1542,9 +1544,8 @@ SELECT b, char_length(b) FROM test2; The types abstime and reltime are lower precision types which are used internally. - You are discouraged from using these types in new - applications and are encouraged to move any old - ones over when appropriate. Any or all of these internal types + You are discouraged from using these types in + applications; these internal types might disappear in a future release. @@ -1555,7 +1556,7 @@ SELECT b, char_length(b) FROM test2; Date and time input is accepted in almost any reasonable format, including ISO 8601, SQL-compatible, traditional POSTGRES, and others. - For some formats, ordering of month, day, and year in date input is + For some formats, ordering of day, month, and year in date input is ambiguous and there is support for specifying the expected ordering of these fields. Set the parameter to MDY to select month-day-year interpretation, @@ -1582,8 +1583,7 @@ SELECT b, char_length(b) FROM test2; type [ (p) ] 'value' - where p in the optional precision - specification is an integer corresponding to the number of + where p is an optional precision corresponding to the number of fractional digits in the seconds field. Precision can be specified for time, timestamp, and interval types. The allowed values are mentioned @@ -1613,15 +1613,15 @@ SELECT b, char_length(b) FROM test2; - - January 8, 1999 - unambiguous in any datestyle input mode - 1999-01-08 ISO 8601; January 8 in any mode (recommended format) + + January 8, 1999 + unambiguous in any datestyle input mode + 1/8/1999 January 8 in MDY mode; @@ -1681,7 +1681,7 @@ SELECT b, char_length(b) FROM test2; January 8, 99 BC - year 99 before the Common Era + year 99 BC @@ -1705,7 +1705,7 @@ SELECT b, char_length(b) FROM test2; The time-of-day types are time [ (p) ] without time zone and time [ (p) ] with time - zone. Writing just time is equivalent to + zone; time is equivalent to time without time zone. @@ -1752,7 +1752,7 @@ SELECT b, char_length(b) FROM test2; 04:05 AM - same as 04:05; AM does not affect value + same as 04:05 (AM ignored) 04:05 PM @@ -1854,7 +1854,7 @@ SELECT b, char_length(b) FROM test2; - Valid input for the time stamp types consists of a concatenation + Valid input for the time stamp types consists of the concatenation of a date and a time, followed by an optional time zone, followed by an optional AD or BC. (Alternatively, AD/BC can appear @@ -1870,7 +1870,7 @@ SELECT b, char_length(b) FROM test2; are valid values, which follow the ISO 8601 - standard. In addition, the wide-spread format: + standard. In addition, the common format: January 8 04:05:06 1999 PST @@ -1880,18 +1880,25 @@ January 8 04:05:06 1999 PST The SQL standard differentiates timestamp without time zone and timestamp with time zone literals by the presence of a - + or -. Hence, according to the standard, + + or - symbol after the time + indicating the time zone offset. Hence, according to the standard: + TIMESTAMP '2004-10-19 10:23:54' - is a timestamp without time zone, while + + is a timestamp without time zone, while: + TIMESTAMP '2004-10-19 10:23:54+02' + is a timestamp with time zone. PostgreSQL never examines the content of a literal string before determining its type, and therefore will treat both of the above as timestamp without time zone. To ensure that a literal is treated as timestamp with time zone, give it the correct explicit type: + TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02' - In a literal that has been decided to be timestamp without time + + In a literal that has been determined to be timestamp without time zone, PostgreSQL will silently ignore any time zone indication. That is, the resulting value is derived from the date/time @@ -1923,7 +1930,7 @@ January 8 04:05:06 1999 PST Conversions between timestamp without time zone and timestamp with time zone normally assume that the timestamp without time zone value should be taken or given - as timezone local time. A different zone reference can + as timezone local time. A different time zone can be specified for the conversion using AT TIME ZONE. @@ -1947,11 +1954,11 @@ January 8 04:05:06 1999 PST linkend="datatype-datetime-special-table">. The values infinity and -infinity are specially represented inside the system and will be displayed - the same way; but the others are simply notational shorthands + unchanged; but the others are simply notational shorthands that will be converted to ordinary date/time values when read. (In particular, now and related strings are converted to a specific time value as soon as they are read.) - All of these values need to be written in single quotes when used + All of these values need to be enclosed in single quotes when used as constants in SQL commands. @@ -2018,8 +2025,8 @@ January 8 04:05:06 1999 PST CURRENT_TIMESTAMP, LOCALTIME, LOCALTIMESTAMP. The latter four accept an optional subsecond precision specification. (See .) Note however that these are - SQL functions and are not recognized as data input strings. + linkend="functions-datetime-current">.) Note that these are + SQL functions and are not recognized in data input strings. @@ -2041,14 +2048,15 @@ January 8 04:05:06 1999 PST - The output format of the date/time types can be set to one of the four - styles ISO 8601, - SQL (Ingres), traditional POSTGRES, and - German, using the command SET datestyle. The default + The output format of the date/time types can one of the four + styles: ISO 8601, + SQL (Ingres), traditional POSTGRES + (Unix date format), and + German. It can be set using the SET datestyle command. The default is the ISO format. (The SQL standard requires the use of the ISO 8601 - format. The name of the SQL output format is a - historical accident.) SQL output format poorly + chosen and an historical accident.) shows examples of each output style. The output of the date and time types is of course only the date or time part @@ -2172,7 +2180,7 @@ January 8 04:05:06 1999 PST Although the date type - does not have an associated time zone, the + cannot have an associated time zone, the time type can. Time zones in the real world have little meaning unless associated with a date as well as a time, @@ -2184,7 +2192,7 @@ January 8 04:05:06 1999 PST The default time zone is specified as a constant numeric offset - from UTC. It is therefore not possible to adapt to + from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries. @@ -2196,7 +2204,7 @@ January 8 04:05:06 1999 PST To address these difficulties, we recommend using date/time types that contain both date and time when using time zones. We - recommend not using the type time with + do not recommend using the type time with time zone (though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard). @@ -2230,12 +2238,12 @@ January 8 04:05:06 1999 PST A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in - contrast to full time zone names which might imply a set of daylight + contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. The recognized abbreviations are listed in the pg_timezone_abbrevs view (see ). You cannot set the configuration parameters or - using a time + to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator. @@ -2252,7 +2260,7 @@ January 8 04:05:06 1999 PST optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT were not already a recognized zone name, it would be accepted and would - be functionally equivalent to USA East Coast time. When a + be functionally equivalent to United States East Coast time. When a daylight-savings zone name is present, it is assumed to be used according to the same daylight-savings transition rules used in the zoneinfo time zone database's posixrules entry. @@ -2265,10 +2273,10 @@ January 8 04:05:06 1999 PST - There is a conceptual and practical difference between the abbreviations - and the full names: abbreviations always represent a fixed offset from + In summary, there is a difference between abbreviations + and full names: abbreviations always represent a fixed offset from UTC, whereas most of the full names imply a local daylight-savings time - rule and so have two possible UTC offsets. + rule, and so have two possible UTC offsets. @@ -2287,7 +2295,7 @@ January 8 04:05:06 1999 PST In all cases, timezone names are recognized case-insensitively. (This is a change from PostgreSQL versions - prior to 8.2, which were case-sensitive in some contexts and not others.) + prior to 8.2, which were case-sensitive in some contexts but not others.) @@ -2308,7 +2316,7 @@ January 8 04:05:06 1999 PST If timezone is not specified in - postgresql.conf nor as a server command-line option, + postgresql.conf or as a server command-line option, the server attempts to use the value of the TZ environment variable as the default time zone. If TZ is not defined or is not any of the time zone names known to @@ -2318,7 +2326,7 @@ January 8 04:05:06 1999 PST default time zone is selected as the closest match among PostgreSQL's known time zones. (These rules are also used to choose the default value of - , if it is not specified.) + , if not specified.) @@ -2332,9 +2340,9 @@ January 8 04:05:06 1999 PST - The PGTZ environment variable, if set at the - client, is used by libpq - applications to send a SET TIME ZONE + The PGTZ environment variable is used by + libpq clients + to send a SET TIME ZONE command to the server upon connection. @@ -2350,7 +2358,7 @@ January 8 04:05:06 1999 PST - interval values can be written with the following + interval values can be written using the following: verbose syntax: @@ -2366,7 +2374,7 @@ January 8 04:05:06 1999 PST or abbreviations or plurals of these units; direction can be ago or empty. The at sign (@) is optional noise. The amounts - of different units are implicitly added up with appropriate + of the different units are implicitly added with appropriate sign accounting. ago negates all the fields. This syntax is also used for interval output, if is set to @@ -2639,8 +2647,8 @@ P years-months-days < PostgreSQL uses Julian dates - for all date/time calculations. They have the nice property of correctly - predicting/calculating any date more recent than 4713 BC + for all date/time calculations. This has the useful property of correctly + calculating dates from 4713 BC to far into the future, using the assumption that the length of the year is 365.2425 days. @@ -2700,9 +2708,9 @@ P years-months-days < 'off' '0' - Leading and trailing whitespace is ignored. Using the key words - TRUE and FALSE is preferred - (and SQL-compliant). + Leading and trailing whitespace and case are ignored. The key words + TRUE and FALSE is the preferred + usage (and SQL-compliant). @@ -2750,9 +2758,9 @@ SELECT * FROM test1 WHERE a; Enumerated (enum) types are data types that - are comprised of a static, predefined set of values with a - specific order. They are equivalent to the enum - types in a number of programming languages. An example of an enum + comprise a static, ordered set of values. + They are equivalent to the enum + types supported in a number of programming languages. An example of an enum type might be the days of the week, or a set of status values for a piece of data. @@ -2796,7 +2804,7 @@ SELECT * FROM person WHERE current_mood = 'happy'; The ordering of the values in an enum type is the - order in which the values were listed when the type was declared. + order in which the values were listed when the type was created. All standard comparison operators and related aggregate functions are supported for enums. For example: @@ -2820,8 +2828,9 @@ SELECT * FROM person WHERE current_mood > 'sad' ORDER BY current_mood; Moe | happy (2 rows) -SELECT name FROM person - WHERE current_mood = (SELECT MIN(current_mood) FROM person); +SELECT name +FROM person +WHERE current_mood = (SELECT MIN(current_mood) FROM person); name ------- Larry @@ -2834,16 +2843,16 @@ SELECT name FROM person Type Safety - Enumerated types are completely separate data types and may not - be compared with each other. + Each enumerated data type is separate and cannot + be compared with other enumerated types. Lack of Casting CREATE TYPE happiness AS ENUM ('happy', 'very happy', 'ecstatic'); -CREATE TABLE holidays ( - num_weeks int, +CREATE TABLE holidays ( + num_weeks integer, happiness happiness ); INSERT INTO holidays(num_weeks,happiness) VALUES (4, 'happy'); @@ -2889,7 +2898,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays Enum labels are case sensitive, so 'happy' is not the same as 'HAPPY'. - Spaces in the labels are significant, too. + White space in the labels is significant too. @@ -2928,7 +2937,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays point 16 bytes - Point on the plane + Point on a plane (x,y) @@ -2971,7 +2980,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays circle 24 bytes Circle - <(x,y),r> (center and radius) + <(x,y),r> (center point and radius) @@ -3000,7 +3009,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays where x and y are the respective - coordinates as floating-point numbers. + coordinates, as floating-point numbers. @@ -3063,11 +3072,9 @@ SELECT person.name, holidays.num_weeks FROM person, holidays - Boxes are output using the first syntax. - The corners are reordered on input to store - the upper right corner, then the lower left corner. - Other corners of the box can be entered, but the lower - left and upper right corners are determined from the input and stored. + Boxes are output using the first syntax. Any two opposite corners + can be supplied; the corners are reordered on input to store the + upper right and lower left corners. @@ -3081,7 +3088,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays Paths are represented by lists of connected points. Paths can be open, where - the first and last points in the list are not considered connected, or + the first and last points in the list are considered not connected, or closed, where the first and last points are considered connected. @@ -3104,7 +3111,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays - Paths are output using the first syntax. + Paths are output using the first appropriate syntax. @@ -3117,8 +3124,8 @@ SELECT person.name, holidays.num_weeks FROM person, holidays Polygons are represented by lists of points (the vertexes of the - polygon). Polygons should probably be - considered equivalent to closed paths, but are stored differently + polygon). Polygons are very similar to closed paths, but are + stored differently and have their own set of support routines. @@ -3149,7 +3156,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays - Circles are represented by a center point and a radius. + Circles are represented by a center point and radius. Values of type circle are specified using the following syntax: @@ -3161,7 +3168,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays where (x,y) - is the center and r is the radius of the circle. + is the center point and r is the radius of the circle. @@ -3182,9 +3189,9 @@ SELECT person.name, holidays.num_weeks FROM person, holidays PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses, as shown in . It - is preferable to use these types instead of plain text types to store - network addresses, because - these types offer input error checking and several specialized + is better to use these types instead of plain text types to store + network addresses because + these types offer input error checking and specialized operators and functions (see ). @@ -3225,7 +3232,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays When sorting inet or cidr data types, IPv4 addresses will always sort before IPv6 addresses, including - IPv4 addresses encapsulated or mapped into IPv6 addresses, such as + IPv4 addresses encapsulated or mapped to IPv6 addresses, such as ::10.2.3.4 or ::ffff:10.4.3.2. @@ -3239,14 +3246,14 @@ SELECT person.name, holidays.num_weeks FROM person, holidays The inet type holds an IPv4 or IPv6 host address, and - optionally the identity of the subnet it is in, all in one field. - The subnet identity is represented by stating how many bits of - the host address represent the network address (the + optionally its subnet, all in one field. + The subnet is represented by the number of network address bits + present in the host address (the netmask). If the netmask is 32 and the address is IPv4, then the value does not indicate a subnet, only a single host. In IPv6, the address length is 128 bits, so 128 bits specify a unique host address. Note that if you - want to accept networks only, you should use the + want to accept only networks, you should use the cidr type rather than inet. @@ -3259,7 +3266,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays y is the number of bits in the netmask. If the /y - part is left off, then the + is missing, the netmask is 32 for IPv4 and 128 for IPv6, so the value represents just a single host. On display, the /y @@ -3285,7 +3292,7 @@ SELECT person.name, holidays.num_weeks FROM person, holidays class="parameter">y is the number of bits in the netmask. If y is omitted, it is calculated using assumptions from the older classful network numbering system, except - that it will be at least large enough to include all of the octets + it will be at least large enough to include all of the octets written in the input. It is an error to specify a network address that has bits set to the right of the specified netmask. @@ -3553,9 +3560,9 @@ SELECT * FROM test; are designed to support full text search, which is the activity of searching through a collection of natural-language documents to locate those that best match a query. - The tsvector type represents a document in a form suited - for text search, while the tsquery type similarly represents - a query. + The tsvector type represents a document stored in a form optimized + for text search; tsquery type similarly represents + a text query. provides a detailed explanation of this facility, and summarizes the related functions and operators. @@ -3570,9 +3577,9 @@ SELECT * FROM test; A tsvector value is a sorted list of distinct - lexemes, which are words that have been - normalized to make different variants of the same word look - alike (see for details). Sorting and + lexemes, which are words which have been + normalized to merge different variants of the same word + (see for details). Sorting and duplicate-elimination are done automatically during input, as shown in this example: @@ -3593,8 +3600,8 @@ SELECT $$the lexeme ' ' contains spaces$$::tsvector; ' ' 'contains' 'lexeme' 'spaces' 'the' - (We use dollar-quoted string literals in this example and the next one, - to avoid confusing matters by having to double quote marks within the + (We use dollar-quoted string literals in this example and the next one + to avoid the confusion of having to double quote marks within the literals.) Embedded quotes and backslashes must be doubled: @@ -3604,8 +3611,8 @@ SELECT $$the lexeme 'Joe''s' contains a quote$$::tsvector; 'Joe''s' 'a' 'contains' 'lexeme' 'quote' 'the' - Optionally, integer position(s) - can be attached to any or all of the lexemes: + Optionally, integer positions + can be attached to lexemes: SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::tsvector; @@ -3617,7 +3624,7 @@ SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::ts A position normally indicates the source word's location in the document. Positional information can be used for proximity ranking. Position values can - range from 1 to 16383; larger numbers are silently clamped to 16383. + range from 1 to 16383; larger numbers are silently set to 16383. Duplicate positions for the same lexeme are discarded. @@ -3643,7 +3650,7 @@ SELECT 'a:1A fat:2B,4C cat:5D'::tsvector; It is important to understand that the tsvector type itself does not perform any normalization; - it assumes that the words it is given are normalized appropriately + it assumes the words it is given are normalized appropriately for the application. For example, @@ -3680,7 +3687,7 @@ SELECT to_tsvector('english', 'The Fat Rats'); A tsquery value stores lexemes that are to be - searched for, and combines them using the boolean operators + searched for, and combines them by honoring the boolean operators & (AND), | (OR), and ! (NOT). Parentheses can be used to enforce grouping of the operators: @@ -3710,7 +3717,7 @@ SELECT 'fat & rat & ! cat'::tsquery; Optionally, lexemes in a tsquery can be labeled with one or more weight letters, which restricts them to match only - tsvector lexemes with one of those weights: + tsvector lexemes with matching weights: SELECT 'fat:ab & cat'::tsquery; @@ -3734,10 +3741,10 @@ SELECT 'super:*'::tsquery; - Quoting rules for lexemes are the same as described above for + Quoting rules for lexemes are the same as described previously for lexemes in tsvector; and, as with tsvector, - any required normalization of words must be done before putting - them into the tsquery type. The to_tsquery + any required normalization of words must be done before converting + to the tsquery type. The to_tsquery function is convenient for performing such normalization: @@ -3762,13 +3769,13 @@ SELECT to_tsquery('Fat:ab & Cats'); The data type uuid stores Universally Unique Identifiers (UUID) as defined by RFC 4122, ISO/IEC 9834-8:2005, and related standards. - (Some systems refer to this data type as globally unique identifier, or - GUID,GUID instead.) Such an + (Some systems refer to this data type as a globally unique identifier, or + GUID,GUID instead.) This identifier is a 128-bit quantity that is generated by an algorithm chosen to make it very unlikely that the same identifier will be generated by anyone else in the known universe using the same algorithm. Therefore, for distributed systems, these identifiers provide a better uniqueness - guarantee than that which can be achieved using sequence generators, which + guarantee than sequence generators, which are only unique within a single database. @@ -3816,10 +3823,10 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11 - The data type xml can be used to store XML data. Its + The xml data type can be used to store XML data. Its advantage over storing XML data in a text field is that it - checks the input values for well-formedness, and there are support - functions to perform type-safe operations on it; see . Use of this data type requires the installation to have been built with configure --with-libxml. @@ -3862,19 +3869,19 @@ xml 'bar' - The xml type does not validate its input values - against a possibly included document type declaration + The xml type does not validate input values + against an optionally-supplied document type declaration (DTD).DTD - The inverse operation, producing character string type values from + The inverse operation, producing a character string value from xml, uses the function xmlserialize:xmlserialize XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type ) - type can be one of + type can be character, character varying, or text (or an alias name for those). Again, according to the SQL standard, this is the only way to convert between type @@ -3883,14 +3890,14 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS - When character string values are cast to or from type + When a character string value is cast to or from type xml without going through XMLPARSE or XMLSERIALIZE, respectively, the choice of DOCUMENT versus CONTENT is determined by the XML option XML option session configuration parameter, which can be set using the - standard command + standard command: SET XML OPTION { DOCUMENT | CONTENT }; @@ -3915,38 +3922,38 @@ SET xmloption TO { DOCUMENT | CONTENT }; end; see . This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in - XML data might become invalid as the character data is converted - to other encodings while travelling between client and server, - while the embedded encoding declaration is not changed. To cope - with this behavior, an encoding declaration contained in a - character string presented for input to the xml type - is ignored, and the content is always assumed + XML data can become invalid as the character data is converted + to other encodings while travelling between client and server + because the embedded encoding declaration is not changed. To cope + with this behavior, encoding declarations contained in + character strings presented for input to the xml type + are ignored, and content is assumed to be in the current server encoding. Consequently, for correct - processing, such character strings of XML data must be sent off + processing, character strings of XML data must be sent from the client in the current client encoding. It is the - responsibility of the client to either convert the document to the - current client encoding before sending it off to the server or to + responsibility of the client to either convert documents to the + current client encoding before sending them to the server or to adjust the client encoding appropriately. On output, values of type xml will not have an encoding declaration, and - clients must assume that the data is in the current client + clients should assume all data is in the current client encoding. - When using the binary mode to pass query parameters to the server + When using binary mode to pass query parameters to the server and query results back to the client, no character set conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by - the XML standard; note that PostgreSQL does not support UTF-16 at - all). On output, data will have an encoding declaration + the XML standard; note that PostgreSQL does not support UTF-16). + On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted. Needless to say, processing XML data with PostgreSQL will be less - error-prone and more efficient if data encoding, client encoding, + error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8. @@ -3973,17 +3980,17 @@ SET xmloption TO { DOCUMENT | CONTENT }; Since there are no comparison operators for the xml data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, - possible workarounds would be casting the expression to a + possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath - expression. The actual query would of course have to be adjusted + expression. Of course, the actual query would have to be adjusted to search by the indexed expression. - The text-search functionality in PostgreSQL could also be used to speed - up full-document searches in XML data. The necessary - preprocessing support is, however, not available in the PostgreSQL - distribution in this release. + The text-search functionality in PostgreSQL can also be used to speed + up full-document searches of XML data. The necessary + preprocessing support is, however, not yet available in the PostgreSQL + distribution. @@ -4191,13 +4198,14 @@ SELECT * FROM pg_attribute The regproc and regoper alias types will only accept input names that are unique (not overloaded), so they are of limited use; for most uses regprocedure or - regoperator is more appropriate. For regoperator, + regoperator are more appropriate. For regoperator, unary operators are identified by writing NONE for the unused operand. - An additional property of the OID alias types is that if a + An additional property of the OID alias types is the creation of + dependencies. If a constant of one of these types appears in a stored expression (such as a column default expression or view), it creates a dependency on the referenced object. For example, if a column has a default @@ -4311,7 +4319,7 @@ SELECT * FROM pg_attribute any - Indicates that a function accepts any input data type whatever. + Indicates that a function accepts any input data type. @@ -4398,7 +4406,7 @@ SELECT * FROM pg_attribute The internal pseudo-type is used to declare functions that are meant only to be called internally by the database - system, and not by direct invocation in a SQL + system, and not by direct invocation in an SQL query. If a function has at least one internal-type argument then it cannot be called from SQL. To preserve the type safety of this restriction it is important to diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index f32c1fc70d..caf4cfd025 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1,4 +1,4 @@ - + Data Definition @@ -153,7 +153,7 @@ DROP TABLE products; - If you need to modify a table that already exists look into later in this chapter. @@ -206,7 +206,7 @@ CREATE TABLE products ( The default value can be an expression, which will be evaluated whenever the default value is inserted (not when the table is created). A common example - is that a timestamp column can have a default of now(), + is for a timestamp column to have a default of CURRENT_TIMESTAMP, so that it gets set to the time of row insertion. Another common example is generating a serial number for each row. In PostgreSQL this is typically done by @@ -374,8 +374,8 @@ CREATE TABLE products ( - Names can be assigned to table constraints in just the same way as - for column constraints: + Names can be assigned to table constraints in the same way as + column constraints: CREATE TABLE products ( product_no integer, @@ -550,15 +550,15 @@ CREATE TABLE products ( - In general, a unique constraint is violated when there are two or - more rows in the table where the values of all of the + In general, a unique constraint is violated when there is more than + one row in the table where the values of all of the columns included in the constraint are equal. However, two null values are not considered equal in this comparison. That means even in the presence of a unique constraint it is possible to store duplicate rows that contain a null value in at least one of the constrained - columns. This behavior conforms to the SQL standard, but we have - heard that other SQL databases might not follow this rule. So be + columns. This behavior conforms to the SQL standard, but there + might be other SQL databases might not follow this rule. So be careful when developing applications that are intended to be portable. @@ -857,7 +857,7 @@ CREATE TABLE order_items ( restrictions are separate from whether the name is a key word or not; quoting a name will not allow you to escape these restrictions.) You do not really need to be concerned about these - columns, just know they exist. + columns; just know they exist. @@ -1037,8 +1037,8 @@ CREATE TABLE order_items ( Command identifiers are also 32-bit quantities. This creates a hard limit of 232 (4 billion) SQL commands within a single transaction. In practice this limit is not a - problem — note that the limit is on number of - SQL commands, not number of rows processed. + problem — note that the limit is on the number of + SQL commands, not the number of rows processed. Also, as of PostgreSQL 8.3, only commands that actually modify the database contents will consume a command identifier. @@ -1055,7 +1055,7 @@ CREATE TABLE order_items ( When you create a table and you realize that you made a mistake, or - the requirements of the application change, then you can drop the + the requirements of the application change, you can drop the table and create it again. But this is not a convenient option if the table is already filled with data, or if the table is referenced by other database objects (for instance a foreign key @@ -1067,31 +1067,31 @@ CREATE TABLE order_items ( - You can + You can: - Add columns, + Add columns - Remove columns, + Remove columns - Add constraints, + Add constraints - Remove constraints, + Remove constraints - Change default values, + Change default values - Change column data types, + Change column data types - Rename columns, + Rename columns - Rename tables. + Rename tables @@ -1110,7 +1110,7 @@ CREATE TABLE order_items ( - To add a column, use a command like this: + To add a column, use a command like: ALTER TABLE products ADD COLUMN description text; @@ -1154,7 +1154,7 @@ ALTER TABLE products ADD COLUMN description text CHECK (description <> '') - To remove a column, use a command like this: + To remove a column, use a command like: ALTER TABLE products DROP COLUMN description; @@ -1250,7 +1250,7 @@ ALTER TABLE products ALTER COLUMN product_no DROP NOT NULL; - To set a new default for a column, use a command like this: + To set a new default for a column, use a command like: ALTER TABLE products ALTER COLUMN price SET DEFAULT 7.77; @@ -1279,7 +1279,7 @@ ALTER TABLE products ALTER COLUMN price DROP DEFAULT; - To convert a column to a different data type, use a command like this: + To convert a column to a different data type, use a command like: ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2); @@ -1488,7 +1488,7 @@ REVOKE ALL ON accounts FROM PUBLIC; Third-party applications can be put into separate schemas so - they cannot collide with the names of other objects. + they do not collide with the names of other objects. @@ -1603,7 +1603,7 @@ CREATE SCHEMA schemaname AUTHORIZATION u In the previous sections we created tables without specifying any - schema names. By default, such tables (and other objects) are + schema names. By default such tables (and other objects) are automatically put into a schema named public. Every new database contains such a schema. Thus, the following are equivalent: @@ -1746,7 +1746,7 @@ SELECT 3 OPERATOR(pg_catalog.+) 4; By default, users cannot access any objects in schemas they do not - own. To allow that, the owner of the schema needs to grant the + own. To allow that, the owner of the schema must grant the USAGE privilege on the schema. To allow users to make use of the objects in the schema, additional privileges might need to be granted, as appropriate for the object. @@ -1802,7 +1802,7 @@ REVOKE CREATE ON SCHEMA public FROM PUBLIC; such names, to ensure that you won't suffer a conflict if some future version defines a system table named the same as your table. (With the default search path, an unqualified reference to - your table name would be resolved as the system table instead.) + your table name would be resolved as a system table instead.) System tables will continue to follow the convention of having names beginning with pg_, so that they will not conflict with unqualified user-table names so long as users avoid @@ -2024,7 +2024,7 @@ WHERE c.altitude > 500; SELECT p.relname, c.name, c.altitude FROM cities c, pg_class p -WHERE c.altitude > 500 and c.tableoid = p.oid; +WHERE c.altitude > 500 AND c.tableoid = p.oid; which returns: @@ -2130,7 +2130,7 @@ VALUES ('New York', NULL, NULL, 'NY'); Table access permissions are not automatically inherited. Therefore, a user attempting to access a parent table must either have permissions - to do the operation on all its child tables as well, or must use the + to do the same operation on all its child tables as well, or must use the ONLY notation. When adding a new child table to an existing inheritance hierarchy, be careful to grant all the needed permissions on it. @@ -2197,7 +2197,7 @@ VALUES ('New York', NULL, NULL, 'NY'); These deficiencies will probably be fixed in some future release, but in the meantime considerable care is needed in deciding whether - inheritance is useful for your problem. + inheritance is useful for your application. @@ -2374,7 +2374,7 @@ CHECK ( outletID >= 100 AND outletID < 200 ) Ensure that the constraints guarantee that there is no overlap between the key values permitted in different partitions. A common - mistake is to set up range constraints like this: + mistake is to set up range constraints like: CHECK ( outletID BETWEEN 100 AND 200 ) CHECK ( outletID BETWEEN 200 AND 300 ) @@ -2424,7 +2424,7 @@ CHECK ( outletID BETWEEN 200 AND 300 ) For example, suppose we are constructing a database for a large ice cream company. The company measures peak temperatures every day as well as ice cream sales in each region. Conceptually, - we want a table like this: + we want a table like: CREATE TABLE measurement ( @@ -2571,12 +2571,15 @@ CREATE TRIGGER insert_measurement_trigger CREATE OR REPLACE FUNCTION measurement_insert_trigger() RETURNS TRIGGER AS $$ BEGIN - IF ( NEW.logdate >= DATE '2006-02-01' AND NEW.logdate < DATE '2006-03-01' ) THEN + IF ( NEW.logdate >= DATE '2006-02-01' AND + NEW.logdate < DATE '2006-03-01' ) THEN INSERT INTO measurement_y2006m02 VALUES (NEW.*); - ELSIF ( NEW.logdate >= DATE '2006-03-01' AND NEW.logdate < DATE '2006-04-01' ) THEN + ELSIF ( NEW.logdate >= DATE '2006-03-01' AND + NEW.logdate < DATE '2006-04-01' ) THEN INSERT INTO measurement_y2006m03 VALUES (NEW.*); ... - ELSIF ( NEW.logdate >= DATE '2008-01-01' AND NEW.logdate < DATE '2008-02-01' ) THEN + ELSIF ( NEW.logdate >= DATE '2008-01-01' AND + NEW.logdate < DATE '2008-02-01' ) THEN INSERT INTO measurement_y2008m01 VALUES (NEW.*); ELSE RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!'; @@ -2706,9 +2709,9 @@ SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; Without constraint exclusion, the above query would scan each of the partitions of the measurement table. With constraint exclusion enabled, the planner will examine the constraints of each - partition and try to prove that the partition need not - be scanned because it could not contain any rows meeting the query's - WHERE clause. When the planner can prove this, it + partition and try to determine which partitions need not + be scanned because they cannot not contain any rows meeting the query's + WHERE clause. When the planner can determine this, it excludes the partition from the query plan. @@ -2875,7 +2878,7 @@ UNION ALL SELECT * FROM measurement_y2008m01; If you are using manual VACUUM or ANALYZE commands, don't forget that - you need to run them on each partition individually. A command like + you need to run them on each partition individually. A command like: ANALYZE measurement; @@ -2903,7 +2906,7 @@ ANALYZE measurement; - Keep the partitioning constraints simple, else the planner may not be + Keep the partitioning constraints simple or else the planner may not be able to prove that partitions don't need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding @@ -2937,7 +2940,7 @@ ANALYZE measurement; that exist in a database. Many other kinds of objects can be created to make the use and management of the data more efficient or convenient. They are not discussed in this chapter, but we give - you a list here so that you are aware of what is possible. + you a list here so that you are aware of what is possible: @@ -2988,7 +2991,7 @@ ANALYZE measurement; When you create complex database structures involving many tables with foreign key constraints, views, triggers, functions, etc. you - will implicitly create a net of dependencies between the objects. + implicitly create a net of dependencies between the objects. For instance, a table with a foreign key constraint depends on the table it references. @@ -3008,7 +3011,7 @@ ERROR: cannot drop table products because other objects depend on it HINT: Use DROP ... CASCADE to drop the dependent objects too. The error message contains a useful hint: if you do not want to - bother deleting all the dependent objects individually, you can run + bother deleting all the dependent objects individually, you can run: DROP TABLE products CASCADE; @@ -3024,7 +3027,7 @@ DROP TABLE products CASCADE; the possible dependencies varies with the type of the object. You can also write RESTRICT instead of CASCADE to get the default behavior, which is to - prevent drops of objects that other objects depend on. + prevent the dropping of objects that other objects depend on. diff --git a/doc/src/sgml/dml.sgml b/doc/src/sgml/dml.sgml index 707501bb04..08fd5b7630 100644 --- a/doc/src/sgml/dml.sgml +++ b/doc/src/sgml/dml.sgml @@ -1,4 +1,4 @@ - + Data Manipulation @@ -14,7 +14,7 @@ table data. We also introduce ways to effect automatic data changes when certain events occur: triggers and rewrite rules. The chapter after this will finally explain how to extract your long-lost data - back out of the database. + from the database. @@ -33,14 +33,14 @@ do before a database can be of much use is to insert data. Data is conceptually inserted one row at a time. Of course you can also insert more than one row, but there is no way to insert less than - one row at a time. Even if you know only some column values, a + one row. Even if you know only some column values, a complete row must be created. To create a new row, use the command. The command requires the - table name and a value for each of the columns of the table. For + table name and column values. For example, consider the products table from : CREATE TABLE products ( @@ -60,7 +60,7 @@ INSERT INTO products VALUES (1, 'Cheese', 9.99); The above syntax has the drawback that you need to know the order - of the columns in the table. To avoid that you can also list the + of the columns in the table. To avoid this you can also list the columns explicitly. For example, both of the following commands have the same effect as the one above: @@ -137,15 +137,15 @@ INSERT INTO products (product_no, name, price) VALUES To perform an update, you need three pieces of information: - The name of the table and column to update, + The name of the table and column to update - The new value of the column, + The new value of the column - Which row(s) to update. + Which row(s) to update @@ -153,10 +153,10 @@ INSERT INTO products (product_no, name, price) VALUES Recall from that SQL does not, in general, provide a unique identifier for rows. Therefore it is not - necessarily possible to directly specify which row to update. + always possible to directly specify which row to update. Instead, you specify which conditions a row must meet in order to - be updated. Only if you have a primary key in the table (no matter - whether you declared it or not) can you reliably address individual rows, + be updated. Only if you have a primary key in the table (independent of + whether you declared it or not) can you reliably address individual rows by choosing a condition that matches the primary key. Graphical database access tools rely on this fact to allow you to update rows individually. @@ -177,7 +177,7 @@ UPDATE products SET price = 10 WHERE price = 5; UPDATE followed by the table name. As usual, the table name can be schema-qualified, otherwise it is looked up in the path. Next is the key word SET followed - by the column name, an equals sign and the new column value. The + by the column name, an equal sign, and the new column value. The new column value can be any scalar expression, not just a constant. For example, if you want to raise the price of all products by 10% you could use: @@ -248,7 +248,10 @@ DELETE FROM products WHERE price = 10; DELETE FROM products; - then all rows in the table will be deleted! Caveat programmer. + then all rows in the table will be deleted! ( can also be used + to delete all rows.) + Caveat programmer.
diff --git a/doc/src/sgml/docguide.sgml b/doc/src/sgml/docguide.sgml index 4cdae29623..e37eac587e 100644 --- a/doc/src/sgml/docguide.sgml +++ b/doc/src/sgml/docguide.sgml @@ -1,4 +1,4 @@ - + Documentation @@ -358,7 +358,7 @@ CATALOG "dsssl/catalog" Create the directory /usr/local/share/sgml/docbook-4.2 and change to it. (The exact location is irrelevant, but this one is - reasonable within the layout we are following here.) + reasonable within the layout we are following here.): $ mkdir /usr/local/share/sgml/docbook-4.2 $ cd /usr/local/share/sgml/docbook-4.2 @@ -368,7 +368,7 @@ CATALOG "dsssl/catalog" - Unpack the archive. + Unpack the archive: $ unzip -a ...../docbook-4.2.zip @@ -392,7 +392,7 @@ CATALOG "docbook-4.2/docbook.cat" Download the ISO 8879 character entities archive, unpack it, and put the - files in the same directory you put the DocBook files in. + files in the same directory you put the DocBook files in: $ cd /usr/local/share/sgml/docbook-4.2 $ unzip ...../ISOEnts.zip @@ -421,7 +421,7 @@ perl -pi -e 's/iso-(.*).gml/ISO\1/g' docbook.cat To install the style sheets, unzip and untar the distribution and move it to a suitable place, for example /usr/local/share/sgml. (The archive will - automatically create a subdirectory.) + automatically create a subdirectory.): $ gunzip docbook-dsssl-1.xx.tar.gz $ tar -C /usr/local/share/sgml -xf docbook-dsssl-1.xx.tar @@ -652,7 +652,7 @@ gmake man.tar.gz D2MDIR=directory doc/src/sgml$ gmake postgres-A4.pdf - or + or: doc/src/sgml$ gmake postgres-US.pdf @@ -738,7 +738,6 @@ save_size.pdfjadetex = 15000 following one. A utility, fixrtf, is available in doc/src/sgml to accomplish these repairs: - doc/src/sgml$ ./fixrtf --refentry postgres.rtf diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml index fdb55c4aec..33ece2f3ec 100644 --- a/doc/src/sgml/ecpg.sgml +++ b/doc/src/sgml/ecpg.sgml @@ -1,4 +1,4 @@ - + <application>ECPG</application> - Embedded <acronym>SQL</acronym> in C @@ -750,7 +750,7 @@ EXEC SQL DEALLOCATE PREPARE name; The pgtypes library maps PostgreSQL database types to C equivalents that can be used in C programs. It also offers - functions to do basic calculations with those types within C, i.e. without + functions to do basic calculations with those types within C, i.e., without the help of the PostgreSQL server. See the following example: The function receives the date dDate as its only parameter. - It will output the date in the form 1999-01-18, i.e. in the + It will output the date in the form 1999-01-18, i.e., in the YYYY-MM-DD format. diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5c3580aec6..b314de3212 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1,4 +1,4 @@ - + Functions and Operators @@ -17,16 +17,16 @@ define their own functions and operators, as described in . The psql commands \df and - \do can be used to show the list of all actually + \do can be used to list all available functions and operators, respectively. - If you are concerned about portability then take note that most of + If you are concerned about portability then note that most of the functions and operators described in this chapter, with the exception of the most trivial arithmetic and comparison operators and some explicitly marked functions, are not specified by the - SQL standard. Some of the extended functionality + SQL standard. Some of this extended functionality is present in other SQL database management systems, and in many cases this functionality is compatible and consistent between the various implementations. This chapter is also @@ -247,8 +247,8 @@ - Comparison operators are available for all data types where this - makes sense. All comparison operators are binary operators that + Comparison operators are available for all relevant data types. + All comparison operators are binary operators that return values of type boolean; expressions like 1 < 2 < 3 are not valid (because there is no < operator to compare a Boolean value with @@ -260,7 +260,7 @@ BETWEEN In addition to the comparison operators, the special - BETWEEN construct is available. + BETWEEN construct is available: a BETWEEN x AND y @@ -268,7 +268,8 @@ a >= x AND a <= y - Similarly, + Note BETWEEN is inclusive in comparing the endpoint + values. NOT BETWEEN does the opposite comparison: a NOT BETWEEN x AND y @@ -276,9 +277,6 @@ a < x OR a > y - There is no difference between the two respective forms apart from - the CPU cycles required to rewrite the first one - into the second one internally. BETWEEN SYMMETRIC @@ -300,12 +298,12 @@ NOTNULL - To check whether a value is or is not null, use the constructs + To check whether a value is or is not null, use the constructs: expression IS NULL expression IS NOT NULL - or the equivalent, but nonstandard, constructs + or the equivalent, but nonstandard, constructs: expression ISNULL expression NOTNULL @@ -324,7 +322,7 @@ - Some applications might expect that + Some applications might expect expression = NULL returns true if expression evaluates to the null value. It is highly recommended that these applications @@ -332,9 +330,7 @@ cannot be done the configuration variable is available. If it is enabled, PostgreSQL will convert x = - NULL clauses to x IS NULL. This was - the default behavior in PostgreSQL - releases 6.5 through 7.1. + NULL clauses to x IS NULL. @@ -346,7 +342,7 @@ IS NOT NULL is true when the row expression itself is non-null and all the row's fields are non-null. Because of this behavior, IS NULL and IS NOT NULL do not always return - inverse results for row-valued expressions, i.e. a row-valued + inverse results for row-valued expressions, i.e., a row-valued expression that contains both NULL and non-null values will return false for both tests. This definition conforms to the SQL standard, and is a change from the @@ -362,17 +358,19 @@ IS NOT DISTINCT FROM - The ordinary comparison operators yield null (signifying unknown) - when either input is null. Another way to do comparisons is with the + Ordinary comparison operators yield null (signifying unknown) + when either input is null, not true or false, e.g., 7 = + NULL yields null. + Another way to do comparisons is with the IS NOT DISTINCT FROM construct: expression IS DISTINCT FROM expression expression IS NOT DISTINCT FROM expression For non-null inputs, IS DISTINCT FROM is - the same as the <> operator. However, when both - inputs are null it will return false, and when just one input is - null it will return true. Similarly, IS NOT DISTINCT + the same as the <> operator. However, if both + inputs are null it returns false, and if only one input is + null it returns true. Similarly, IS NOT DISTINCT FROM is identical to = for non-null inputs, but it returns true when both inputs are null, and false when only one input is null. Thus, these constructs effectively act as though null @@ -442,8 +440,8 @@ Mathematical operators are provided for many - PostgreSQL types. For types without - common mathematical conventions for all possible permutations + PostgreSQL types. For types that support + only limited mathematical operations (e.g., date/time types) we describe the actual behavior in subsequent sections. @@ -489,7 +487,7 @@ / - division (integer division truncates results) + division (integer division truncates the result) 4 / 2 2 @@ -686,7 +684,7 @@ abs(x) - (same as x) + (same as input) absolute value abs(-17.4) 17.4 @@ -820,7 +818,7 @@ random() dp - random value between 0.0 and 1.0 + random value between 0.0 and 1.0, inclusive random() @@ -844,7 +842,8 @@ setseed(dp) void - set seed for subsequent random() calls (value between -1.0 and 1.0) + set seed for subsequent random() calls (value between -1.0 and + 1.0, inclusive) setseed(0.54823) @@ -1332,8 +1331,8 @@ ASCII code of the first character of the argument. For UTF8 returns the Unicode code - point of the character. For other multibyte encodings. the - argument must be a strictly ASCII character. + point of the character. For other multibyte encodings, the + argument must be an ASCII character. ascii('x') 120 @@ -1358,7 +1357,7 @@ Character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte - encodings the argument must designate a strictly + encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes. @@ -1383,7 +1382,8 @@ linkend="conversion-names"> for available conversions. convert('text_in_utf8', 'UTF8', 'LATIN1') - text_in_utf8 represented in ISO 8859-1 encoding + text_in_utf8 represented in Latin-1 + encoding (ISO 8859-1) @@ -1796,8 +1796,8 @@ The conversion names follow a standard naming scheme: The official name of the source encoding with all non-alphanumeric characters replaced by underscores followed - by _to_ followed by the equally processed - destination encoding name. Therefore the names might deviate + by _to_ followed by similarly + destination encoding name. Therefore, the names might deviate from the customary encoding names. @@ -2585,12 +2585,11 @@ - SQL defines some string functions with a - special syntax where - certain key words rather than commas are used to separate the + SQL defines some string functions that use + a key word syntax, rather than commas to separate arguments. Details are in . - Some functions are also implemented using the regular syntax for + Such functions are also implemented using the regular syntax for function invocation. (See .) @@ -2932,7 +2931,7 @@ cast(-44 as bit(12)) 111111010100 '1110'::bit(4)::integer 14 Note that casting to just bit means casting to - bit(1), and so it will deliver only the least significant + bit(1), and so will deliver only the least significant bit of the integer. @@ -2964,7 +2963,8 @@ cast(-44 as bit(12)) 111111010100 SQL:1999), and POSIX-style regular expressions. Aside from the basic does this string match this pattern? operators, functions are available to extract - or replace matching substrings and to split a string at the matches. + or replace matching substrings and to split a string at matching + locations. @@ -2987,10 +2987,9 @@ cast(-44 as bit(12)) 111111010100 - Every pattern defines a set of strings. - The LIKE expression returns true if the - string is contained in the set of - strings represented by pattern. (As + The LIKE expression returns true if + string matches the supplied + pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa. An equivalent expression is @@ -3019,13 +3018,13 @@ cast(-44 as bit(12)) 111111010100 - LIKE pattern matches always cover the entire - string. To match a sequence anywhere within a string, the - pattern must therefore start and end with a percent sign. + LIKE pattern matching always covers the entire + string. Therefore, to match a sequence anywhere within a string, the + pattern must start and end with a percent sign. - To match a literal underscore or percent sign without matching + To match only a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape @@ -3042,7 +3041,7 @@ cast(-44 as bit(12)) 111111010100 actually matches a literal backslash means writing four backslashes in the statement. You can avoid this by selecting a different escape character with ESCAPE; then a backslash is not special to - LIKE anymore. (But it is still special to the string + LIKE anymore. (But backslash is still special to the string literal parser, so you still need two of them.) @@ -3095,7 +3094,7 @@ cast(-44 as bit(12)) 111111010100 The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string. - It is much like LIKE, except that it + It is similar to LIKE, except that it interprets the pattern using the SQL standard's definition of a regular expression. SQL regular expressions are a curious cross between LIKE notation and common regular @@ -3103,9 +3102,9 @@ cast(-44 as bit(12)) 111111010100 - Like LIKE, the SIMILAR TO + Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string; - this is unlike common regular expression practice, wherein the pattern + this is unlike common regular expression behavior where the pattern can match any part of the string. Also like LIKE, SIMILAR TO uses @@ -3153,7 +3152,7 @@ cast(-44 as bit(12)) 111111010100 Notice that bounded repetition (? and {...}) - are not provided, though they exist in POSIX. Also, the dot (.) + is not provided, though they exist in POSIX. Also, the period (.) is not a metacharacter. @@ -3180,7 +3179,7 @@ cast(-44 as bit(12)) 111111010100 escape-character), provides extraction of a substring that matches an SQL regular expression pattern. As with SIMILAR TO, the - specified pattern must match to the entire data string, else the + specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern that should be returned on success, the pattern must contain two occurrences of the escape character followed by a double quote @@ -3190,7 +3189,7 @@ cast(-44 as bit(12)) 111111010100 - Some examples: + Some examples, with #" delimiting the return string: substring('foobar' from '%#"o_b#"%' for '#') oob substring('foobar' from '#"o_b#"%' for '#') NULL @@ -3284,7 +3283,7 @@ substring('foobar' from '#"o_b#"%' for '#') NULLLIKE, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use - different special characters than LIKE does. + different special characters than LIKE. Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or @@ -3505,9 +3504,9 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; PostgreSQL's regular expressions are implemented - using a package written by Henry Spencer. Much of + using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his - manual entry. + manual. @@ -3519,7 +3518,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; (roughly those of ed). PostgreSQL supports both forms, and also implements some extensions - that are not in the POSIX standard, but have become widely used anyway + that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs @@ -3536,7 +3535,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; PostgreSQL can be chosen by setting the run-time parameter. The usual setting is advanced, but one might choose - extended for maximum backwards compatibility with + extended for backwards compatibility with pre-7.4 releases of PostgreSQL. @@ -3551,7 +3550,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; A branch is zero or more quantified atoms or constraints, concatenated. - It matches a match for the first, followed by a match for the second, etc; + It tries a match of the first, followed by a match for the second, etc; an empty branch matches the empty string. @@ -3568,8 +3567,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; A constraint matches an empty string, but matches only when - specific conditions are met. A constraint can be used where an atom - could be used, except it cannot be followed by a quantifier. + specific conditions are met. A constraint cannot be followed by a quantifier. The simple constraints are shown in ; some more constraints are described later. @@ -3618,7 +3616,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; \k (where k is a non-alphanumeric character) matches that character taken as an ordinary character, - e.g. \\ matches a backslash character + e.g., \\ matches a backslash character @@ -3756,7 +3754,8 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; - A quantifier cannot immediately follow another quantifier. + A quantifier cannot immediately follow another quantifier, e.g., + ** is invalid. A quantifier cannot begin an expression or subexpression or follow ^ or |. @@ -3777,12 +3776,12 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; ^ - matches at the beginning of the string + matches the beginning of the string $ - matches at the end of the string + matches the end of the string @@ -3822,21 +3821,21 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; in the list are separated by -, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, - e.g. [0-9] in ASCII matches + e.g., [0-9] in ASCII matches any decimal digit. It is illegal for two ranges to share an - endpoint, e.g. a-c-e. Ranges are very + endpoint, e.g., a-c-e. Ranges are very collating-sequence-dependent, so portable programs should avoid relying on them. To include a literal ] in the list, make it the - first character (following a possible ^). To + first character (possibly following a ^). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal - - as the first endpoint of a range, enclose it + - as the start of a range, enclose it in [. and .] to make it a - collating element (see below). With the exception of these characters, + collating element (see below). With the exception of these characters and some combinations using [ (see next paragraphs), and escapes (AREs only), all other special characters lose their special significance within a bracket expression. @@ -3851,9 +3850,10 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; character, or a collating-sequence name for either) enclosed in [. and .] stands for the sequence of characters of that collating element. The sequence is - a single element of the bracket expression's list. A bracket - expression containing a multiple-character collating element can thus - match more than one character, e.g. if the collating sequence + treated as a single element of the bracket expression's list. This + allows a bracket + expression containing a multiple-character collating element to + match more than one character, e.g., if the collating sequence includes a ch collating element, then the RE [[.ch.]]*c matches the first five characters of chchcc. @@ -3861,15 +3861,15 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; - PostgreSQL currently has no multicharacter collating + PostgreSQL currently does not support multi-character collating elements. This information describes possible future behavior. Within a bracket expression, a collating element enclosed in - [= and =] is an equivalence - class, standing for the sequences of characters of all collating + [= and =] is an equivalence + class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were [. and @@ -3910,8 +3910,8 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; or an underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. - The constraint escapes described below are usually preferable (they - are no more standard, but are certainly easier to type). + The constraint escapes described below are usually preferable; they + are no more standard, but are easier to type. @@ -3933,7 +3933,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; Character-entry escapes exist to make it easier to specify - non-printing and otherwise inconvenient characters in REs. They are + non-printing and inconvenient characters in REs. They are shown in . @@ -3996,7 +3996,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; \B - synonym for \ to help reduce the need for backslash + synonym for backslash (\) to help reduce the need for backslash doubling @@ -4038,14 +4038,14 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; \uwxyz (where wxyz is exactly four hexadecimal digits) the UTF16 (Unicode, 16-bit) character U+wxyz - in the local byte ordering + in the local byte encoding \Ustuvwxyz (where stuvwxyz is exactly eight hexadecimal digits) - reserved for a somewhat-hypothetical Unicode extension to 32 bits + reserved for a hypothetical Unicode extension to 32 bits @@ -4055,34 +4055,34 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; - \xhhh - (where hhh is any sequence of hexadecimal + \x### + (where ### is any sequence of hexadecimal digits) the character whose hexadecimal value is - 0xhhh + 0x### (a single character no matter how many hexadecimal digits are used) \0 - the character whose value is 0 + the character whose value is 0 (the null byte) - \xy - (where xy is exactly two octal digits, + \## + (where ## is exactly two octal digits, and is not a back reference) the character whose octal value is - 0xy + 0## - \xyz - (where xyz is exactly three octal digits, + \### + (where ### is exactly three octal digits, and is not a back reference) the character whose octal value is - 0xyz + 0### @@ -4245,7 +4245,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; - There is an inherent historical ambiguity between octal character-entry + There is an inherent ambiguity between octal character-entry escapes and back references, which is resolved by heuristics, as hinted at above. A leading zero always indicates an octal escape. @@ -4253,7 +4253,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; is always taken as a back reference. A multidigit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression - (i.e. the number is in the legal range for a back reference), + (i.e., the number is in the legal range for a back reference), and otherwise is taken as octal. @@ -4401,7 +4401,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; - white space and comments cannot appear within multicharacter symbols, + white space and comments cannot appear within multi-character symbols, such as (?: @@ -4417,7 +4417,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; (where ttt is any text not containing a )) is a comment, completely ignored. Again, this is not allowed between the characters of - multicharacter symbols, like (?:. + multi-character symbols, like (?:. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead. @@ -4566,9 +4566,9 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, - e.g. x becomes [xX]. + e.g., x becomes [xX]. When it appears inside a bracket expression, all case counterparts - of it are added to the bracket expression, e.g. + of it are added to the bracket expression, e.g., [x] becomes [xX] and [^x] becomes [^xX]. @@ -4670,7 +4670,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); BREs differ from EREs in several respects. - |, +, and ? + In BREs, |, +, and ? are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are @@ -4691,7 +4691,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); \< and \> are synonyms for [[:<:]] and [[:>:]] - respectively; no other escapes are available. + respectively; no other escapes are available in BREs. @@ -4732,8 +4732,10 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); template that defines the output or input format. - The to_timestamp function can also take a single - double precision argument to convert from Unix epoch to + A single-argument to_timestamp function is also + available; it accepts a + double precision argument and converts from Unix epoch + (seconds since 1970-01-01 00:00:00+00) to timestamp with time zone. (Integer Unix epochs are implicitly cast to double precision.) @@ -4804,19 +4806,18 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); to_timestamp(double precision) timestamp with time zone convert UNIX epoch to time stamp - to_timestamp(200120400) + to_timestamp(1284352323) - In an output template string (for to_char), there are certain patterns that are - recognized and replaced with appropriately-formatted data from the value - to be formatted. Any text that is not a template pattern is simply - copied verbatim. Similarly, in an input template string (for anything but to_char), template patterns - identify the parts of the input data string to be looked at and the - values to be found there. + In a to_char output template string, there are certain patterns that are + recognized and replaced with appropriately-formatted data based on the value. + Any text that is not a template pattern is simply + copied verbatim. Similarly, in an input template string (anything but to_char), template patterns + identify the values to be supplied by the input data string. @@ -4928,7 +4929,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); Month - full mixed-case month name (blank-padded to 9 chars) + full capitalized month name (blank-padded to 9 chars) month @@ -4940,7 +4941,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); Mon - abbreviated mixed-case month name (3 chars in English, localized lengths vary) + abbreviated capitalized month name (3 chars in English, localized lengths vary) mon @@ -4956,7 +4957,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); Day - full mixed-case day name (blank-padded to 9 chars) + full capitalized day name (blank-padded to 9 chars) day @@ -4968,7 +4969,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); Dy - abbreviated mixed-case day name (3 chars in English, localized lengths vary) + abbreviated capitalized day name (3 chars in English, localized lengths vary) dy @@ -5004,7 +5005,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); IW - ISO week number of year (1 - 53; the first Thursday of the new year is in week 1.) + ISO week number of year (01 - 53; the first Thursday of the new year is in week 1.) CC @@ -5020,26 +5021,26 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); RM - month in Roman numerals (I-XII; I=January) (uppercase) + uppercase month in Roman numerals (I-XII; I=January) rm - month in Roman numerals (i-xii; i=January) (lowercase) + lowercase month in Roman numerals (i-xii; i=January) TZ - time-zone name (uppercase) + uppercase time-zone name tz - time-zone name (lowercase) + lowercase time-zone name - Certain modifiers can be applied to any template pattern to alter its + Modifiers can be applied to any template pattern to alter its behavior. For example, FMMonth is the Month pattern with the FM modifier. @@ -5060,18 +5061,18 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); FM prefix - fill mode (suppress padding blanks and zeroes) + fill mode (suppress padding of blanks and zeroes) FMMonth TH suffix uppercase ordinal number suffix - DDTH + DDTH, e.g., 12TH th suffix lowercase ordinal number suffix - DDth + DDth, e.g., 12th FX prefix @@ -5086,7 +5087,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); SP suffix - spell mode (not yet implemented) + spell mode (not supported) DDSP @@ -5114,12 +5115,13 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); to_timestamp and to_date - skip multiple blank spaces in the input string if the FX option - is not used. FX must be specified as the first item - in the template. For example - to_timestamp('2000    JUN', 'YYYY MON') is correct, but - to_timestamp('2000    JUN', 'FXYYYY MON') returns an error, + skip multiple blank spaces in the input string unless the FX option + is used. For example, + to_timestamp('2000    JUN', 'YYYY MON') works, but + to_timestamp('2000    JUN', 'FXYYYY MON') returns an error because to_timestamp expects one space only. + FX must be specified as the first item in + the template. @@ -5140,15 +5142,15 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); If you want to have a double quote in the output you must precede it with a backslash, for example E'\\"YYYY Month\\"'. - (Two backslashes are necessary because the backslash already - has a special meaning when using the escape string syntax.) + (Two backslashes are necessary because the backslash + has special meaning when using the escape string syntax.) The YYYY conversion from string to timestamp or - date has a restriction if you use a year with more than 4 digits. You must + date has a restriction when processing years with more than 4 digits. You must use some non-digit character or template after YYYY, otherwise the year is always interpreted as 4 digits. For example (with the year 20000): @@ -5163,7 +5165,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); In conversions from string to timestamp or - date, the CC field is ignored if there + date, the CC field (century) is ignored if there is a YYY, YYYY or Y,YYY field. If CC is used with YY or Y then the year is computed @@ -5173,16 +5175,22 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); - An ISO week date (as distinct from a Gregorian date) can be specified to to_timestamp and to_date in one of two ways: + An ISO week date (as distinct from a Gregorian date) can be + specified to to_timestamp and + to_date in one of two ways: - Year, week and weekday, for example to_date('2006-42-4', 'IYYY-IW-ID') returns the date 2006-10-19. If you omit the weekday it is assumed to be 1 (Monday). + Year, week, and weekday: for example to_date('2006-42-4', + 'IYYY-IW-ID') returns the date + 2006-10-19. If you omit the weekday it + is assumed to be 1 (Monday). - Year and day of year, for example to_date('2006-291', 'IYYY-IDDD') also returns 2006-10-19. + Year and day of year: for example to_date('2006-291', + 'IYYY-IDDD') also returns 2006-10-19. @@ -5192,16 +5200,17 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); Gregorian date fields is nonsensical, and will cause an error. In the context of an ISO year, the concept of a month or day of month has no meaning. In the context of a Gregorian year, the - ISO week has no meaning. Users should take care to keep Gregorian and - ISO date specifications separate. + ISO week has no meaning. Users should avoid mixing Gregorian and + ISO date specifications. - Millisecond (MS) and microsecond (US) - values in a conversion from string to timestamp are used as part of the - seconds after the decimal point. For example + In a conversion from string to timestamp, millisecond + (MS) and microsecond (US) + values are used as the + seconds digits after the decimal point. For example to_timestamp('12:3', 'SS:MS') is not 3 milliseconds, but 300, because the conversion counts it as 12 + 0.3 seconds. This means for the format SS:MS, the input values @@ -5232,7 +5241,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); to_char(interval) formats HH and HH12 as hours in a single day, while HH24 - can output hours exceeding a single day, e.g. >24. + can output hours exceeding a single day, e.g., >24. @@ -5304,7 +5313,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); RN - roman numeral (input between 1 and 3999) + Roman numeral (input between 1 and 3999) TH or th @@ -5316,7 +5325,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); EEEE - scientific notation (not implemented yet) + scientific notation (not implemented) @@ -5331,10 +5340,10 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); A sign formatted using SG, PL, or MI is not anchored to the number; for example, - to_char(-12, 'S9999') produces '  -12', - but to_char(-12, 'MI9999') produces '-  12'. + to_char(-12, 'MI9999') produces '-  12' + but to_char(-12, 'S9999') produces '  -12'. The Oracle implementation does not allow the use of - MI ahead of 9, but rather + MI before 9, but rather requires that 9 precede MI. @@ -5371,8 +5380,8 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); n is the number of digits following V. to_char does not support the use of - V combined with a decimal point. - (E.g., 99.9V99 is not allowed.) + V with non-integer values. + (e.g., 99.9V99 is not allowed.) @@ -5666,7 +5675,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); - date '2001-10-01' - date '2001-09-28' - integer '3' + integer '3' (days) @@ -5819,7 +5828,7 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); age(timestamp) interval - Subtract from current_date + Subtract from current_date (at midnight) age(timestamp '1957-06-13') 43 years 8 mons 3 days @@ -5941,16 +5950,16 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); justify_days(interval) interval Adjust interval so 30-day time periods are represented as months - justify_days(interval '30 days') - 1 month + justify_days(interval '35 days') + 1 mon 5 days justify_hours(interval) interval Adjust interval so 24-hour time periods are represented as days - justify_hours(interval '24 hours') - 1 day + justify_hours(interval '27 hours') + 1 day 03:00:00 @@ -6094,8 +6103,8 @@ EXTRACT(field FROM source) such as year or hour from date/time values. source must be a value expression of type timestamp, time, or interval. - (Expressions of type date will - be cast to timestamp and can therefore be used as + (Expressions of type date are + cast to timestamp and can therefore be used as well.) field is an identifier or string that selects what field to extract from the source value. The extract function returns values of type @@ -6108,7 +6117,7 @@ EXTRACT(field FROM source) century - The century + The century: @@ -6122,7 +6131,7 @@ SELECT EXTRACT(CENTURY FROM TIMESTAMP '2001-02-16 20:38:40'); The first century starts at 0001-01-01 00:00:00 AD, although they did not know it at the time. This definition applies to all Gregorian calendar countries. There is no century number 0, - you go from -1 to 1. + you go from -1 century to 1 century. If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma, Vatican. @@ -6178,7 +6187,7 @@ SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40'); Note that extract's day of the week numbering - is different from that of the to_char(..., + differs from that of the to_char(..., 'D') function. @@ -6204,7 +6213,7 @@ SELECT EXTRACT(DOY FROM TIMESTAMP '2001-02-16 20:38:40'); For date and timestamp values, the - number of seconds since 1970-01-01 00:00:00-00 (can be negative); + number of seconds since 1970-01-01 00:00:00-00 GMT (can be negative); for interval values, the total number of seconds in the interval @@ -6266,7 +6275,7 @@ SELECT EXTRACT(ISODOW FROM TIMESTAMP '2001-02-18 20:38:40'); isoyear - The ISO 8601 year that the date falls in (not applicable to intervals). + The ISO 8601 year that the date falls in (not applicable to intervals) @@ -6290,7 +6299,7 @@ SELECT EXTRACT(ISOYEAR FROM DATE '2006-01-02'); The seconds field, including fractional parts, multiplied by 1 - 000 000. Note that this includes full seconds. + 000 000; note that this includes full seconds @@ -6314,7 +6323,7 @@ SELECT EXTRACT(MILLENNIUM FROM TIMESTAMP '2001-02-16 20:38:40'); Years in the 1900s are in the second millennium. - The third millennium starts January 1, 2001. + The third millennium started January 1, 2001. @@ -6380,7 +6389,7 @@ SELECT EXTRACT(MONTH FROM INTERVAL '2 years 13 months'); quarter - The quarter of the year (1 - 4) that the day is in + The quarter of the year (1 - 4) that the date is in @@ -6527,8 +6536,8 @@ date_trunc('field', source source is a value expression of type timestamp or interval. (Values of type date and - time are cast automatically, to timestamp or - interval respectively.) + time are cast automatically to timestamp or + interval, respectively.) field selects to which precision to truncate the input value. The return value is of type timestamp or interval @@ -6611,7 +6620,8 @@ SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40'); timestamp with time zone AT TIME ZONE zone timestamp without time zone - Convert given time stamp with time zone to the new time zone + Convert given time stamp with time zone to the new time + zone, with no time zone designation @@ -6634,7 +6644,7 @@ SELECT date_trunc('year', TIMESTAMP '2001-02-16 20:38:40'); - Examples (supposing that the local time zone is PST8PDT): + Examples (assuming the local time zone is PST8PDT): SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'MST'; Result: 2001-02-16 19:38:40-08 @@ -6698,7 +6708,7 @@ LOCALTIMESTAMP(precision) CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP - can optionally be given + can optionally take a precision parameter, which causes the result to be rounded to that many fractional digits in the seconds field. Without a precision parameter, the result is given to the full available precision. @@ -6747,20 +6757,15 @@ SELECT LOCALTIMESTAMP; current time at the instant the function is called. The complete list of non-SQL-standard time functions is: -now() transaction_timestamp() statement_timestamp() clock_timestamp() timeofday() +now() - now() is a traditional PostgreSQL - equivalent to CURRENT_TIMESTAMP. - transaction_timestamp() is likewise equivalent to - CURRENT_TIMESTAMP, but is named to clearly reflect - what it returns. statement_timestamp() returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client). @@ -6774,6 +6779,11 @@ timeofday() clock_timestamp(), it returns the actual current time, but as a formatted text string rather than a timestamp with time zone value. + now() is a traditional PostgreSQL + equivalent to CURRENT_TIMESTAMP. + transaction_timestamp() is likewise equivalent to + CURRENT_TIMESTAMP, but is named to clearly reflect + what it returns. @@ -7135,7 +7145,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple Before PostgreSQL 8.2, the containment operators @> and <@ were respectively called ~ and @. These names are still - available, but are deprecated and will eventually be retired. + available, but are deprecated and will eventually be removed. @@ -7406,7 +7416,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple It is possible to access the two component numbers of a point - as though it were an array with indices 0 and 1. For example, if + as though they were an array with indices 0 and 1. For example, if t.p is a point column then SELECT p[0] FROM t retrieves the X coordinate and UPDATE t SET p[1] = ... changes the Y coordinate. @@ -7422,7 +7432,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple path are non-intersecting. For example, the path '((0,0),(0,1),(2,1),(2,2),(1,2),(1,0),(0,0))'::PATH - won't work, however, the following visually identical + will not work; however, the following visually identical path '((0,0),(0,1),(1,1),(1,2),(2,2),(2,1),(1,1),(1,0),(0,0))'::PATH will work. If the concept of an intersecting versus @@ -7442,8 +7452,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple The operators <<, <<=, >>, and >>= test for subnet inclusion. They - consider only the network parts of the two addresses, ignoring any - host part, and determine whether one network part is identical to + consider only the network parts of the two addresses (ignoring any + host part) and determine whether one network is identical to or a subnet of the other. @@ -7545,8 +7555,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple shows the functions available for use with the cidr and inet - types. The host, - text, and abbrev + types. The abbrev, host, + and text functions are primarily intended to offer alternative display formats. @@ -8066,8 +8076,8 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple The function xmlcomment creates an XML value containing an XML comment with the specified text as content. - The text cannot contain -- or end with a - - so that the resulting construct is a valid + The text cannot contain -- or end with a + - so that the resulting construct is a valid XML comment. If the argument is null, the result is null. @@ -8197,7 +8207,7 @@ SELECT xmlelement(name "foo$bar", xmlattributes('xyz' as "a&b")); An explicit attribute name need not be specified if the attribute value is a column reference, in which case the column's name will - be used as attribute name by default. In any other case, the + be used as the attribute name by default. In other cases, the attribute must be given an explicit name. So this example is valid: @@ -8213,7 +8223,7 @@ SELECT xmlelement(name test, xmlattributes(func(a, b))) FROM test; Element content, if specified, will be formatted according to - data type. If the content is itself of type xml, + the data type. If the content is itself of type xml, complex XML documents can be constructed. For example: abc123 -SELECT xmlforest(table_name, column_name) FROM information_schema.columns WHERE table_schema = 'pg_catalog'; +SELECT xmlforest(table_name, column_name) +FROM information_schema.columns +WHERE table_schema = 'pg_catalog'; xmlforest ------------------------------------------------------------------------------------------- @@ -8287,7 +8299,7 @@ SELECT xmlforest(table_name, column_name) FROM information_schema.columns WHERE Note that XML forests are not valid XML documents if they consist - of more than one element. So it might be useful to wrap + of more than one element, so it might be useful to wrap xmlforest expressions in xmlelement. @@ -8330,20 +8342,21 @@ SELECT xmlpi(name php, 'echo "hello world";'); - xmlroot(xml, version text|no value , standalone yes|no|no value) + xmlroot(xml, version text | no value , standalone yes|no|no value) The xmlroot expression alters the properties of the root node of an XML value. If a version is specified, - this replaces the value in the version declaration, if a + this replaces the value in the version declaration; if a standalone value is specified, this replaces the value in the standalone declaration. abc'), version '1.0', standalone yes); +SELECT xmlroot(xmlparse(document 'abc'), + version '1.0', standalone yes); xmlroot ---------------------------------------- @@ -8464,7 +8477,8 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab; Example: test', ARRAY[ARRAY['my', 'http://example.com']]); +SELECT xpath('/my:a/text()', 'test', + ARRAY[ARRAY['my', 'http://example.com']]); xpath -------- @@ -8483,11 +8497,12 @@ SELECT xpath('/my:a/text()', 'test', The following functions map the contents of relational tables to - XML values. They can be thought of as XML export functionality. + XML values. They can be thought of as XML export functionality: table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text) query_to_xml(query text, nulls boolean, tableforest boolean, targetns text) -cursor_to_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, targetns text) +cursor_to_xml(cursor refcursor, count int, nulls boolean, + tableforest boolean, targetns text) The return type of each function is xml. @@ -8502,7 +8517,7 @@ cursor_to_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, t query and maps the result set. cursor_to_xml fetches the indicated number of rows from the cursor specified by the parameter - cursor. This variant is recommendable if + cursor. This variant is recommended if large tables have to be mapped, because the result value is built up in memory by each function. @@ -8564,7 +8579,7 @@ cursor_to_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, t The parameter nulls determines whether null values should be included in the output. If true, null values in - columns are represented as + columns are represented as: ]]> @@ -8581,9 +8596,8 @@ cursor_to_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, t - The following functions return XML Schema documents describing the - mappings made by the data mappings produced by the corresponding - functions above. + The following functions return XML Schema documents similar to the + mappings produced by the corresponding functions above: table_to_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) query_to_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) @@ -8597,7 +8611,7 @@ cursor_to_xmlschema(cursor refcursor, nulls boolean, tableforest boolean, target The following functions produce XML data mappings and the corresponding XML Schema in one document (or forest), linked together. They can be useful where self-contained and - self-describing results are wanted. + self-describing results are wanted: table_to_xml_and_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) query_to_xml_and_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) @@ -8607,7 +8621,7 @@ query_to_xml_and_xmlschema(query text, nulls boolean, tableforest boolean, targe In addition, the following functions are available to produce analogous mappings of entire schemas or the entire current - database. + database: schema_to_xml(schema name, nulls boolean, tableforest boolean, targetns text) schema_to_xmlschema(schema name, nulls boolean, tableforest boolean, targetns text) @@ -8620,7 +8634,7 @@ database_to_xml_and_xmlschema(nulls boolean, tableforest boolean, targetns text) Note that these potentially produce a lot of data, which needs to be built up in memory. When requesting content mappings of large - schemas or databases, it may be worthwhile to consider mapping the + schemas or databases, it might be worthwhile to consider mapping the tables separately instead, possibly even through a cursor. @@ -8664,12 +8678,12 @@ table2-mapping - As an example for using the output produced by these functions, + As an example of using the output produced by these functions, shows an XSLT stylesheet that converts the output of table_to_xml_and_xmlschema to an HTML document containing a tabular rendition of the table data. In a - similar manner, the result data of these functions can be + similar manner, the results from these functions can be converted into other XML-based formats. @@ -8798,13 +8812,13 @@ table2-mapping - The sequence to be operated on by a sequence-function call is specified by - a regclass argument, which is just the OID of the sequence in the + The sequence to be operated on by a sequence function is specified by + a regclass argument, which is simply the OID of the sequence in the pg_class system catalog. You do not have to look up the OID by hand, however, since the regclass data type's input converter will do the work for you. Just write the sequence name enclosed - in single quotes, so that it looks like a literal constant. To - achieve some compatibility with the handling of ordinary + in single quotes so that it looks like a literal constant. For + compatibility with the handling of ordinary SQL names, the string will be converted to lowercase unless it contains double quotes around the sequence name. Thus: @@ -8839,7 +8853,7 @@ nextval('foo') searches search path for fo Since this is really just an OID, it will track the originally identified sequence despite later renaming, schema reassignment, etc. This early binding behavior is usually desirable for - sequence references in column defaults and views. But sometimes you will + sequence references in column defaults and views. But sometimes you might want late binding where the sequence reference is resolved at run time. To get late-binding behavior, force the constant to be stored as a text constant instead of regclass: @@ -8881,7 +8895,7 @@ nextval('foo'::text) foo is looked up at Return the value most recently obtained by nextval for this sequence in the current session. (An error is reported if nextval has never been called for this - sequence in this session.) Notice that because this is returning + sequence in this session.) Because this is returning a session-local value, it gives a predictable answer whether or not other sessions have executed nextval since the current session did. @@ -8897,8 +8911,8 @@ nextval('foo'::text) foo is looked up at nextval in the current session. This function is identical to currval, except that instead of taking the sequence name as an argument it fetches the - value of the last sequence that nextval - was used on in the current session. It is an error to call + value of the last sequence used by nextval + in the current session. It is an error to call lastval if nextval has not yet been called in the current session. @@ -8916,9 +8930,9 @@ nextval('foo'::text) foo is looked up at nextval will advance the sequence before returning a value. The value reported by currval is also set to the specified value. In the three-parameter form, - is_called can be set either true + is_called can be set to either true or false. true has the same effect as - the two-parameter form. If it's set to false, the + the two-parameter form. If it is set to false, the next nextval will return exactly the specified value, and sequence advancement commences with the following nextval. Furthermore, the value reported by @@ -8941,7 +8955,7 @@ SELECT setval('foo', 42, false); Next nextval wi If a sequence object has been created with default parameters, - nextval calls on it will return successive values + nextval will return successive values beginning with 1. Other behaviors can be obtained by using special parameters in the command; see its command reference page for more information. @@ -8949,7 +8963,7 @@ SELECT setval('foo', 42, false); Next nextval wi - To avoid blocking of concurrent transactions that obtain numbers from the + To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means @@ -8981,7 +8995,7 @@ SELECT setval('foo', 42, false); Next nextval wi If your needs go beyond the capabilities of these conditional - expressions you might want to consider writing a stored procedure + expressions, you might want to consider writing a stored procedure in a more expressive programming language. @@ -8992,7 +9006,7 @@ SELECT setval('foo', 42, false); Next nextval wi The SQL CASE expression is a generic conditional expression, similar to if/else statements in - other languages: + other programming languages: CASE WHEN condition THEN result @@ -9004,12 +9018,12 @@ END CASE clauses can be used wherever an expression is valid. condition is an expression that returns a boolean result. If the result is true - then the value of the CASE expression is the - result that follows the condition. If the result is false any + the value of the CASE expression is the + result that follows the condition. If the result is false subsequent WHEN clauses are searched in the same manner. If no WHEN condition is true then the value of the - case expression is the result in the + case expression is the result of the ELSE clause. If the ELSE clause is omitted and no condition matches, the result is null. @@ -9044,12 +9058,12 @@ SELECT a, The data types of all the result expressions must be convertible to a single output type. - See for more detail. + See for more details. - The following simple CASE expression is a - specialized variant of the general form above: + The following CASE expression is a + variant of the general form above: CASE expression @@ -9061,9 +9075,9 @@ END The expression is computed and compared to - all the value specifications in the + all the values in the WHEN clauses until one is found that is equal. If - no match is found, the result in the + no match is found, the result of the ELSE clause (or a null value) is returned. This is similar to the switch statement in C. @@ -9088,8 +9102,8 @@ SELECT a, - A CASE expression does not evaluate any subexpressions - that are not needed to determine the result. For example, this is a + A CASE expression evaluates any subexpressions + that are needed to determine the result. For example, this is a possible way of avoiding a division-by-zero failure: SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false END; @@ -9127,8 +9141,8 @@ SELECT COALESCE(description, short_description, '(none)') ... - Like a CASE expression, COALESCE will - not evaluate arguments that are not needed to determine the result; + Like a CASE expression, COALESCE only + evaluates arguments that are needed to determine the result; that is, arguments to the right of the first non-null argument are not evaluated. This SQL-standard function provides capabilities similar to NVL and IFNULL, which are used in some other @@ -9149,8 +9163,8 @@ SELECT COALESCE(description, short_description, '(none)') ... The NULLIF function returns a null value if - value1 and value2 - are equal; otherwise it returns value1. + value1 equals value2; + otherwise it returns value1. This can be used to perform the inverse operation of the COALESCE example given above: @@ -9335,7 +9349,7 @@ SELECT NULLIF(value, '(none)') ... shows the functions available for use with array types. See - for more discussion and examples of the use of these functions. + for more information and examples of the use of these functions. @@ -9485,7 +9499,7 @@ SELECT NULLIF(value, '(none)') ... text - concatenates array elements using provided delimiter + concatenates array elements using supplied delimiter array_to_string(ARRAY[1, 2, 3], '~^~') 1~^~2~^~3 @@ -9507,7 +9521,7 @@ SELECT NULLIF(value, '(none)') ... text[] - splits string into array elements using provided delimiter + splits string into array elements using supplied delimiter string_to_array('xx~^~yy~^~zz', '~^~') {xx,yy,zz} @@ -9542,7 +9556,7 @@ SELECT NULLIF(value, '(none)') ... Aggregate functions compute a single result - value from a set of input values. The built-in aggregate functions + from a set of input values. The built-in aggregate functions are listed in and . @@ -9595,7 +9609,7 @@ SELECT NULLIF(value, '(none)') ... precision, numeric, or interval - numeric for any integer type argument, + numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type @@ -9787,8 +9801,8 @@ SELECT NULLIF(value, '(none)') ... SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...; - Here ANY can be considered both as leading - to a subquery or as an aggregate if the select expression returns 1 row. + Here ANY can be considered as leading either + to a subquery or to an aggregate, if the select expression returns one row. Thus the standard name cannot be given to these aggregates. @@ -9796,14 +9810,14 @@ SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...; Users accustomed to working with other SQL database management - systems might be surprised by the performance of the + systems might be disappointed by the performance of the count aggregate when it is applied to the entire table. A query like: SELECT count(*) FROM sometable; will be executed by PostgreSQL using a - sequential scan of the entire table. + sequential scan of an entire table. @@ -10517,17 +10531,17 @@ EXISTS (subquery) - The subquery will generally only be executed far enough to determine + The subquery will generally only be executed long enough to determine whether at least one row is returned, not all the way to completion. - It is unwise to write a subquery that has any side effects (such as - calling sequence functions); whether the side effects occur or not - might be difficult to predict. + It is unwise to write a subquery that has side effects (such as + calling sequence functions); whether the side effects occur + might be unpredictable. Since the result depends only on whether any rows are returned, and not on the contents of those rows, the output list of the - subquery is normally uninteresting. A common coding convention is + subquery is normally unimportant. A common coding convention is to write all EXISTS tests in the form EXISTS(SELECT 1 WHERE ...). There are exceptions to this rule however, such as subqueries that use INTERSECT. @@ -10536,10 +10550,11 @@ EXISTS (subquery) This simple example is like an inner join on col2, but it produces at most one output row for each tab1 row, - even if there are multiple matching tab2 rows: + even if there are several matching tab2 rows: -SELECT col1 FROM tab1 - WHERE EXISTS(SELECT 1 FROM tab2 WHERE col2 = tab1.col2); +SELECT col1 +FROM tab1 +WHERE EXISTS (SELECT 1 FROM tab2 WHERE col2 = tab1.col2); @@ -10556,7 +10571,7 @@ SELECT col1 FROM tab1 subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN is true if any equal subquery row is found. - The result is false if no equal row is found (including the special + The result is false if no equal row is found (including the case where the subquery returns no rows). @@ -10585,7 +10600,7 @@ SELECT col1 FROM tab1 expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. The result of IN is true if any equal subquery row is found. - The result is false if no equal row is found (including the special + The result is false if no equal row is found (including the case where the subquery returns no rows). @@ -10612,7 +10627,7 @@ SELECT col1 FROM tab1 subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of NOT IN is true if only unequal subquery rows - are found (including the special case where the subquery returns no rows). + are found (including the case where the subquery returns no rows). The result is false if any equal row is found. @@ -10641,7 +10656,7 @@ SELECT col1 FROM tab1 expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. The result of NOT IN is true if only unequal subquery rows - are found (including the special case where the subquery returns no rows). + are found (including the case where the subquery returns no rows). The result is false if any equal row is found. @@ -10671,7 +10686,7 @@ SELECT col1 FROM tab1 given operator, which must yield a Boolean result. The result of ANY is true if any true result is obtained. - The result is false if no true result is found (including the special + The result is false if no true result is found (including the case where the subquery returns no rows). @@ -10709,7 +10724,7 @@ SELECT col1 FROM tab1 The result of ANY is true if the comparison returns true for any subquery row. The result is false if the comparison returns false for every - subquery row (including the special case where the subquery returns no + subquery row (including the case where the subquery returns no rows). The result is NULL if the comparison does not return true for any row, and it returns NULL for at least one row. @@ -10735,7 +10750,7 @@ SELECT col1 FROM tab1 given operator, which must yield a Boolean result. The result of ALL is true if all rows yield true - (including the special case where the subquery returns no rows). + (including the case where the subquery returns no rows). The result is false if any false result is found. The result is NULL if the comparison does not return false for any row, and it returns NULL for at least one row. @@ -10763,7 +10778,7 @@ SELECT col1 FROM tab1 evaluated and compared row-wise to each row of the subquery result, using the given operator. The result of ALL is true if the comparison - returns true for all subquery rows (including the special + returns true for all subquery rows (including the case where the subquery returns no rows). The result is false if the comparison returns false for any subquery row. @@ -10855,7 +10870,7 @@ SELECT col1 FROM tab1 The forms involving array subexpressions are PostgreSQL extensions; the rest are SQL-compliant. - All of the expression forms documented in this section return + All of the expressions documented in this section return Boolean (true/false) results. @@ -10926,7 +10941,7 @@ AND x NOT IN y is equivalent to NOT (x IN y) in all cases. However, null values are much more likely to trip up the novice when working with NOT IN than when working with IN. - It's best to express your condition positively if possible. + It is best to express your condition positively if possible. @@ -10947,7 +10962,7 @@ AND given operator, which must yield a Boolean result. The result of ANY is true if any true result is obtained. - The result is false if no true result is found (including the special + The result is false if no true result is found (including the case where the array has zero elements). @@ -10983,7 +10998,7 @@ AND given operator, which must yield a Boolean result. The result of ALL is true if all comparisons yield true - (including the special case where the array has zero elements). + (including the case where the array has zero elements). The result is false if any false result is found. @@ -11066,8 +11081,8 @@ AND This construct is similar to a <> row comparison, but it does not yield null for null inputs. Instead, any null value is considered unequal to (distinct from) any non-null value, and any two - nulls are considered equal (not distinct). Thus the result will always - be either true or false, never null. + nulls are considered equal (not distinct). Thus the result will + either be true or false, never null. @@ -11173,7 +11188,7 @@ AND Zero rows are also returned for NULL inputs. It is an error for step to be zero. Some examples follow: -select * from generate_series(2,4); +SELECT * FROM generate_series(2,4); generate_series ----------------- 2 @@ -11181,7 +11196,7 @@ select * from generate_series(2,4); 4 (3 rows) -select * from generate_series(5,1,-2); +SELECT * FROM generate_series(5,1,-2); generate_series ----------------- 5 @@ -11189,13 +11204,13 @@ select * from generate_series(5,1,-2); 1 (3 rows) -select * from generate_series(4,3); +SELECT * FROM generate_series(4,3); generate_series ----------------- (0 rows) -- this example relies on the date-plus-integer operator -select current_date + s.a as dates from generate_series(0,14,7) as s(a); +SELECT current_date + s.a AS dates FROM generate_series(0,14,7) AS s(a); dates ------------ 2004-02-05 @@ -11203,7 +11218,7 @@ select current_date + s.a as dates from generate_series(0,14,7) as s(a); 2004-02-19 (3 rows) -select * from generate_series('2008-03-01 00:00'::timestamp, +SELECT * FROM generate_series('2008-03-01 00:00'::timestamp, '2008-03-04 12:00', '10 hours'); generate_series --------------------- @@ -11490,7 +11505,7 @@ postgres=# select * from unnest2(array[[1,2],[3,4]]); the current database connection; but superusers can change this setting with . The current_user is the user identifier - that is applicable for permission checking. Normally, it is equal + that is applicable for permission checking. Normally it is equal to the session user, but it can be changed with . It also changes during the execution of @@ -11512,13 +11527,13 @@ postgres=# select * from unnest2(array[[1,2],[3,4]]); current_schema returns the name of the schema that is - at the front of the search path (or a null value if the search path is + first in the search path (or a null value if the search path is empty). This is the schema that will be used for any tables or other named objects that are created without specifying a target schema. current_schemas(boolean) returns an array of the names of all schemas presently in the search path. The Boolean option determines whether or not - implicitly included system schemas such as pg_catalog are included in the search - path returned. + implicitly included system schemas such as pg_catalog are included in the + returned search path. @@ -11567,10 +11582,10 @@ SET search_path TO schema , schema, .. pg_my_temp_schema returns the OID of the current - session's temporary schema, or 0 if it has none (because it has not - created any temporary tables). + session's temporary schema, or 0 if it has none (because no + temporary tables have been created). pg_is_other_temp_schema returns true if the - given OID is the OID of any other session's temporary schema. + given OID is the OID of another session's temporary schema. (This can be useful, for example, to exclude other sessions' temporary tables from a catalog display.) @@ -11864,8 +11879,8 @@ SELECT has_table_privilege('joe', 'mytable', 'INSERT, SELECT WITH GRANT OPTION') has_any_column_privilege checks whether a user can - access any column of a table in a particular way. The possibilities for - its arguments are the same as for has_table_privilege, + access any column of a table in a particular way; its argument possibilities + are analogous to has_table_privilege, except that the desired access privilege type must evaluate to some combination of SELECT, @@ -11881,8 +11896,8 @@ SELECT has_table_privilege('joe', 'mytable', 'INSERT, SELECT WITH GRANT OPTION') has_column_privilege checks whether a user - can access a column in a particular way. The possibilities for its - arguments are analogous to has_table_privilege, + can access a column in a particular way; its argument possibilities + are analogous to has_table_privilege, with the addition that the column can be specified either by name or attribute number. The desired access privilege type must evaluate to some combination of @@ -11895,8 +11910,8 @@ SELECT has_table_privilege('joe', 'mytable', 'INSERT, SELECT WITH GRANT OPTION') has_database_privilege checks whether a user - can access a database in a particular way. The possibilities for its - arguments are analogous to has_table_privilege. + can access a database in a particular way; its argument possibilities + are analogous to has_table_privilege. The desired access privilege type must evaluate to some combination of CREATE, CONNECT, @@ -11907,8 +11922,8 @@ SELECT has_table_privilege('joe', 'mytable', 'INSERT, SELECT WITH GRANT OPTION') has_function_privilege checks whether a user - can access a function in a particular way. The possibilities for its - arguments are analogous to has_table_privilege. + can access a function in a particular way; its argument possibilities + are analogous to has_table_privilege. When specifying a function by a text string rather than by OID, the allowed input is the same as for the regprocedure data type (see ). @@ -11922,24 +11937,24 @@ SELECT has_function_privilege('joeuser', 'myfunc(int, text)', 'execute'); has_foreign_data_wrapper_privilege checks whether a user - can access a foreign-data wrapper in a particular way. The possibilities for its - arguments are analogous to has_table_privilege. + can access a foreign-data wrapper in a particular way; its argument possibilities + are analogous to has_table_privilege. The desired access privilege type must evaluate to USAGE. has_language_privilege checks whether a user - can access a procedural language in a particular way. The possibilities - for its arguments are analogous to has_table_privilege. + can access a procedural language in a particular way; its argument possibilities + are analogous to has_table_privilege. The desired access privilege type must evaluate to USAGE. has_schema_privilege checks whether a user - can access a schema in a particular way. The possibilities for its - arguments are analogous to has_table_privilege. + can access a schema in a particular way; its argument possibilities + are analogous to has_table_privilege. The desired access privilege type must evaluate to some combination of CREATE or USAGE. @@ -11947,24 +11962,24 @@ SELECT has_function_privilege('joeuser', 'myfunc(int, text)', 'execute'); has_server_privilege checks whether a user - can access a foreign server in a particular way. The possibilities for its - arguments are analogous to has_table_privilege. + can access a foreign server in a particular way; its argument possibilities + are analogous to has_table_privilege. The desired access privilege type must evaluate to USAGE. has_tablespace_privilege checks whether a user - can access a tablespace in a particular way. The possibilities for its - arguments are analogous to has_table_privilege. + can access a tablespace in a particular way; its argument possibilities + are analogous to has_table_privilege. The desired access privilege type must evaluate to CREATE. pg_has_role checks whether a user - can access a role in a particular way. The possibilities for its - arguments are analogous to has_table_privilege. + can access a role in a particular way; its argument possibilities + are analogous to has_table_privilege. The desired access privilege type must evaluate to some combination of MEMBER or USAGE. @@ -12111,8 +12126,8 @@ SELECT relname FROM pg_class WHERE pg_table_is_visible(oid); SELECT pg_type_is_visible('myschema.widget'::regtype); - Note that it would not make much sense to test an unqualified name in - this way — if the name can be recognized at all, it must be visible. + Note that it would not make much sense to test a non-schema-qualified + type name in this way — if the name can be recognized at all, it must be visible. @@ -12280,7 +12295,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); pg_get_userbyid(roleid) name - get role name with given ID + get role name with given OID pg_get_viewdef(view_name) @@ -12374,11 +12389,11 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); as a double-quoted identifier, meaning it is lowercased by default, while the second parameter, being just a column name, is treated as double-quoted and has its case preserved. The function returns a value - suitably formatted for passing to the sequence functions (see ). This association can be modified or removed with ALTER SEQUENCE OWNED BY. (The function probably should have been called - pg_get_owned_sequence; its name reflects the fact + pg_get_owned_sequence; its current name reflects the fact that it's typically used with serial or bigserial columns.) @@ -12442,7 +12457,7 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33); The functions shown in extract comments previously stored with the command. A null value is returned if no - comment could be found matching the specified parameters. + comment could be found for the specified parameters. @@ -12489,16 +12504,16 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33); comment for a database object specified by its OID and the name of the containing system catalog. For example, obj_description(123456,'pg_class') - would retrieve the comment for a table with OID 123456. + would retrieve the comment for the table with OID 123456. The one-parameter form of obj_description requires only - the object OID. It is now deprecated since there is no guarantee that + the object OID. It is deprecated since there is no guarantee that OIDs are unique across different system catalogs; therefore, the wrong - comment could be returned. + comment might be returned. shobj_description is used just like - obj_description only that it is used for retrieving + obj_description except it is used for retrieving comments on shared objects. Some system catalogs are global to all databases within each cluster and their descriptions are stored globally as well. @@ -12530,7 +12545,7 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33); The functions shown in - export server internal transaction information to user level. The main + export server transaction information. The main use of these functions is to determine which transactions were committed between two snapshots. @@ -12578,10 +12593,10 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33);
- The internal transaction ID type (xid) is 32 bits wide and so - it wraps around every 4 billion transactions. However, these functions + The internal transaction ID type (xid) is 32 bits wide and + wraps around every 4 billion transactions. However, these functions export a 64-bit format that is extended with an epoch counter - so that it will not wrap around for the life of an installation. + so it will not wrap around during the life of an installation. The data type used by these functions, txid_snapshot, stores information about transaction ID visibility at a particular moment in time. Its components are @@ -12612,7 +12627,7 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33); xmax - First as-yet-unassigned txid. All txids later than this one are + First as-yet-unassigned txid. All txids later than this are not yet started as of the time of the snapshot, and thus invisible. @@ -12664,7 +12679,7 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33); current_setting(setting_name) text - current value of setting + get current value of setting @@ -12803,15 +12818,16 @@ SELECT set_config('log_statement_stats', 'off', false); send signals (SIGINT or SIGTERM respectively) to backend processes identified by process ID. The process ID of an active backend can be found from - the procpid column in the + the procpid column of the pg_stat_activity view, or by listing the - postgres processes on the server with - ps. + postgres processes on the server using + ps on Unix or the Task + Manager on Windows. pg_reload_conf sends a SIGHUP signal - to the server, causing the configuration files + to the server, causing configuration files to be reloaded by all server processes. @@ -12874,7 +12890,7 @@ SELECT set_config('log_statement_stats', 'off', false); pg_stop_backup() text - Finish performing on-line backup + Finalize after performing on-line backup @@ -12916,14 +12932,14 @@ SELECT set_config('log_statement_stats', 'off', false); - pg_start_backup accepts a text parameter which is an + pg_start_backup accepts an arbitrary user-defined label for the backup. (Typically this would be the name under which the backup dump file will be stored.) The function - writes a backup label file into the database cluster's data directory, - performs a checkpoint, + writes a backup label file (backup_label) into the + database cluster's data directory, performs a checkpoint, and then returns the backup's starting transaction log location as text. - The user need not pay any attention to this result value, but it is - provided in case it is of use. + The user can ignore this result value, but it is + provided in case it is useful. postgres=# select pg_start_backup('label_goes_here'); pg_start_backup @@ -12939,12 +12955,13 @@ postgres=# select pg_start_backup('label_goes_here'); pg_stop_backup removes the label file created by - pg_start_backup, and instead creates a backup history file in + pg_start_backup, and creates a backup history file in the transaction log archive area. The history file includes the label given to pg_start_backup, the starting and ending transaction log locations for the backup, and the starting and ending times of the backup. The return - value is the backup's ending transaction log location (which again might be of little - interest). After noting the ending location, the current transaction log insertion + value is the backup's ending transaction log location (which again + can be ignored). After recording the ending location, the current + transaction log insertion point is automatically advanced to the next transaction log file, so that the ending transaction log file can be archived immediately to complete the backup. @@ -12952,7 +12969,7 @@ postgres=# select pg_start_backup('label_goes_here'); pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived (assuming you are using continuous archiving). - The result is the ending transaction log location + 1 within the just-completed transaction log file. + The return value is the ending transaction log location + 1 within the just-completed transaction log file. If there has been no transaction log activity since the last transaction log switch, pg_switch_xlog does nothing and returns the start location of the transaction log file currently in use. @@ -12960,7 +12977,7 @@ postgres=# select pg_start_backup('label_goes_here'); pg_current_xlog_location displays the current transaction log write - location in the same format used by the above functions. Similarly, + location in the format used by the above functions. Similarly, pg_current_xlog_insert_location displays the current transaction log insertion point. The insertion point is the logical end of the transaction log @@ -12978,7 +12995,7 @@ postgres=# select pg_start_backup('label_goes_here'); corresponding transaction log file name and byte offset from the results of any of the above functions. For example: -postgres=# select * from pg_xlogfile_name_offset(pg_stop_backup()); +postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); file_name | file_offset --------------------------+------------- 00000001000000000000000D | 4039624 @@ -12999,7 +13016,7 @@ postgres=# select * from pg_xlogfile_name_offset(pg_stop_backup()); The functions shown in calculate - the actual disk space usage of database objects. + the disk space usage of database objects. @@ -13057,7 +13074,7 @@ postgres=# select * from pg_xlogfile_name_offset(pg_stop_backup()); Disk space used by the specified fork, 'main' or 'fsm', of a table or index with the specified OID - or name. The table name can be qualified with a schema name + or name; the table name can be schema-qualified. @@ -13097,8 +13114,8 @@ postgres=# select * from pg_xlogfile_name_offset(pg_stop_backup()); bigint Total disk space used by the table with the specified OID or name, - including indexes and toasted data. The table name can be - qualified with a schema name + including indexes and TOAST data; the table name can be + schema-qualified. @@ -13140,10 +13157,10 @@ postgres=# select * from pg_xlogfile_name_offset(pg_stop_backup()); The functions shown in provide native file access to + linkend="functions-admin-genfile"> provide native access to files on the machine hosting the server. Only files within the database cluster directory and the log_directory can be - accessed. Use a relative path for files within the cluster directory, + accessed. Use a relative path for files in the cluster directory, and a path matching the log_directory configuration setting for log files. Use of these functions is restricted to superusers. @@ -13209,7 +13226,7 @@ postgres=# select * from pg_xlogfile_name_offset(pg_stop_backup()); size, last accessed time stamp, last modified time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a boolean - indicating if it is a directory. Typical usages include: + indicating if it is a directory. Typical usage include: SELECT * FROM pg_stat_file('filename'); SELECT (pg_stat_file('filename')).modification; @@ -13218,7 +13235,7 @@ SELECT (pg_stat_file('filename')).modification; The functions shown in manage - advisory locks. For details about proper usage of these functions, see + advisory locks. For details about proper use of these functions, see . @@ -13366,7 +13383,7 @@ SELECT (pg_stat_file('filename')).modification; pg_advisory_lock, except the function will not wait for the lock to become available. It will either obtain the lock immediately and return true, or return false if the lock cannot be - acquired now. + acquired immediately.
@@ -13375,7 +13392,7 @@ SELECT (pg_stat_file('filename')).modification; pg_try_advisory_lock_shared works the same as pg_try_advisory_lock, except it attempts to acquire - shared rather than exclusive lock. + a shared rather than an exclusive lock. @@ -13384,8 +13401,8 @@ SELECT (pg_stat_file('filename')).modification; pg_advisory_unlock will release a previously-acquired exclusive advisory lock. It - will return true if the lock is successfully released. - If the lock was in fact not held, it will return false, + returns true if the lock is successfully released. + If the lock was not held, it will return false, and in addition, an SQL warning will be raised by the server. @@ -13395,7 +13412,7 @@ SELECT (pg_stat_file('filename')).modification; pg_advisory_unlock_shared works the same as pg_advisory_unlock, - except to release a shared advisory lock. + except is releases a shared advisory lock. @@ -13404,7 +13421,7 @@ SELECT (pg_stat_file('filename')).modification; pg_advisory_unlock_all will release all advisory locks held by the current session. (This function is implicitly invoked - at session end, even if the client disconnects ungracefully.) + at session end, even if the client disconnects abruptly.) diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 480d345cf0..efd4adced7 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1,4 +1,4 @@ - + High Availability, Load Balancing, and Replication @@ -414,7 +414,7 @@ protocol to make nodes agree on a serializable transactional order. Data partitioning splits tables into data sets. Each set can be modified by only one server. For example, data can be - partitioned by offices, e.g. London and Paris, with a server + partitioned by offices, e.g., London and Paris, with a server in each office. If queries combining London and Paris data are necessary, an application can query both servers, or master/slave replication can be used to keep a read-only copy diff --git a/doc/src/sgml/history.sgml b/doc/src/sgml/history.sgml index 8098470262..c2ae854d8f 100644 --- a/doc/src/sgml/history.sgml +++ b/doc/src/sgml/history.sgml @@ -1,4 +1,4 @@ - + A Brief History of <productname>PostgreSQL</productname> @@ -12,7 +12,7 @@ The object-relational database management system now known as PostgreSQL is derived from the POSTGRES package written at the - University of California at Berkeley. With over a decade of + University of California at Berkeley. With over two decades of development behind it, PostgreSQL is now the most advanced open-source database available anywhere. @@ -93,7 +93,7 @@ - In 1994, Andrew Yu and Jolly Chen added a SQL language interpreter + In 1994, Andrew Yu and Jolly Chen added an SQL language interpreter to POSTGRES. Under a new name, Postgres95 was subsequently released to the web to find its own way in the world as an open-source diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index d3d1baed24..e40724df17 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -1,4 +1,4 @@ - + Indexes @@ -27,35 +27,35 @@ CREATE TABLE test1 ( content varchar );
- and the application requires a lot of queries of the form: + and the application issues many queries of the form: SELECT content FROM test1 WHERE id = constant; With no advance preparation, the system would have to scan the entire test1 table, row by row, to find all - matching entries. If there are a lot of rows in - test1 and only a few rows (perhaps only zero - or one) that would be returned by such a query, then this is clearly an - inefficient method. But if the system has been instructed to maintain an - index on the id column, then it can use a more + matching entries. If there are many rows in + test1 and only a few rows (perhaps zero + or one) that would be returned by such a query, this is clearly an + inefficient method. But if the system maintains an + index on the id column, it can use a more efficient method for locating matching rows. For instance, it might only have to walk a few levels deep into a search tree. - A similar approach is used in most books of non-fiction: terms and + A similar approach is used in most non-fiction books: terms and concepts that are frequently looked up by readers are collected in an alphabetic index at the end of the book. The interested reader can scan the index relatively quickly and flip to the appropriate page(s), rather than having to read the entire book to find the material of interest. Just as it is the task of the author to - anticipate the items that the readers are likely to look up, + anticipate the items that readers are likely to look up, it is the task of the database programmer to foresee which indexes - will be of advantage. + will be useful. - The following command would be used to create the index on the + The following command can be used to create an index on the id column, as discussed: CREATE INDEX test1_id_index ON test1 (id); @@ -73,7 +73,7 @@ CREATE INDEX test1_id_index ON test1 (id); Once an index is created, no further intervention is required: the system will update the index when the table is modified, and it will - use the index in queries when it thinks this would be more efficient + use the index in queries when it thinks it would be more efficient than a sequential table scan. But you might have to run the ANALYZE command regularly to update statistics to allow the query planner to make educated decisions. @@ -87,14 +87,14 @@ CREATE INDEX test1_id_index ON test1 (id); DELETE commands with search conditions. Indexes can moreover be used in join searches. Thus, an index defined on a column that is part of a join condition can - significantly speed up queries with joins. + also significantly speed up queries with joins. Creating an index on a large table can take a long time. By default, PostgreSQL allows reads (selects) to occur - on the table in parallel with creation of an index, but writes (inserts, - updates, deletes) are blocked until the index build is finished. + on the table in parallel with index creation, but writes (INSERTs, + UPDATEs, DELETEs) are blocked until the index build is finished. In production environments this is often unacceptable. It is possible to allow writes to occur in parallel with index creation, but there are several caveats to be aware of — @@ -118,8 +118,8 @@ CREATE INDEX test1_id_index ON test1 (id); PostgreSQL provides several index types: B-tree, Hash, GiST and GIN. Each index type uses a different algorithm that is best suited to different types of queries. - By default, the CREATE INDEX command will create a - B-tree index, which fits the most common situations. + By default, the CREATE INDEX command creates + B-tree indexes, which fit the most common situations. @@ -159,11 +159,11 @@ CREATE INDEX test1_id_index ON test1 (id); 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your database does not use the C locale you will need to create the index with a special - operator class to support indexing of pattern-matching queries. See + operator class to support indexing of pattern-matching queries; see below. It is also possible to use B-tree indexes for ILIKE and ~*, but only if the pattern starts with - non-alphabetic characters, i.e. characters that are not affected by + non-alphabetic characters, i.e., characters that are not affected by upper/lower case conversion. @@ -180,7 +180,7 @@ CREATE INDEX test1_id_index ON test1 (id); Hash indexes can only handle simple equality comparisons. The query planner will consider using a hash index whenever an indexed column is involved in a comparison using the - = operator. (But hash indexes do not support + = operator. (Hash indexes do not support IS NULL searches.) The following command is used to create a hash index: @@ -290,11 +290,11 @@ CREATE TABLE test2 ( ); (say, you keep your /dev - directory in a database...) and you frequently make queries like: + directory in a database...) and you frequently issue queries like: SELECT name FROM test2 WHERE major = constant AND minor = constant; - then it might be appropriate to define an index on the columns + then it might be appropriate to define an index on columns major and minor together, e.g.: @@ -359,7 +359,7 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor); Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized. See also for some discussion of the - merits of different index setups. + merits of different index configurations. @@ -375,7 +375,7 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor); In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. - This allows a query's ORDER BY specification to be met + This allows a query's ORDER BY specification to be honored without a separate sorting step. Of the index types currently supported by PostgreSQL, only B-tree can produce sorted output — the other index types return @@ -384,22 +384,23 @@ CREATE INDEX test2_mm_idx ON test2 (major, minor); The planner will consider satisfying an ORDER BY specification - either by scanning any available index that matches the specification, + by either scanning an available index that matches the specification, or by scanning the table in physical order and doing an explicit sort. For a query that requires scanning a large fraction of the - table, the explicit sort is likely to be faster because it requires - less disk I/O due to a better-ordered access pattern. Indexes are + table, the explicit sort is likely to be faster than using an index + because it requires + less disk I/O due to a sequential access pattern. Indexes are more useful when only a few rows need be fetched. An important special case is ORDER BY in combination with LIMIT n: an explicit sort will have to process - all the data to identify the first n rows, but if there is - an index matching the ORDER BY then the first n + all data to identify the first n rows, but if there is + an index matching the ORDER BY, the first n rows can be retrieved directly, without scanning the remainder at all. By default, B-tree indexes store their entries in ascending order - with nulls last. This means that a forward scan of an index on a + with nulls last. This means that a forward scan of an index on column x produces output satisfying ORDER BY x (or more verbosely, ORDER BY x ASC NULLS LAST). The index can also be scanned backward, producing output satisfying @@ -432,14 +433,14 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST); ORDER BY x DESC, y DESC if we scan backward. But it might be that the application frequently needs to use ORDER BY x ASC, y DESC. There is no way to get that - ordering from a regular index, but it is possible if the index is defined + ordering from a simpler index, but it is possible if the index is defined as (x ASC, y DESC) or (x DESC, y ASC). Obviously, indexes with non-default sort orderings are a fairly specialized feature, but sometimes they can produce tremendous - speedups for certain queries. Whether it's worth keeping such an + speedups for certain queries. Whether it's worth creating such an index depends on how often you use queries that require a special sort ordering. @@ -468,7 +469,7 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST); - Beginning in release 8.1, + Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans. The system can form AND @@ -513,7 +514,7 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST); more efficient than index combination for queries involving both columns, but as discussed in , it would be almost useless for queries involving only y, so it - could not be the only index. A combination of the multicolumn index + should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on @@ -547,16 +548,16 @@ CREATE UNIQUE INDEX name ON table When an index is declared unique, multiple table rows with equal - indexed values will not be allowed. Null values are not considered + indexed values are not allowed. Null values are not considered equal. A multicolumn unique index will only reject cases where all - of the indexed columns are equal in two rows. + indexed columns are equal in multiple rows. PostgreSQL automatically creates a unique - index when a unique constraint or a primary key is defined for a table. + index when a unique constraint or primary key is defined for a table. The index covers the columns that make up the primary key or unique - columns (a multicolumn index, if appropriate), and is the mechanism + constraint (a multicolumn index, if appropriate), and is the mechanism that enforces the constraint. @@ -583,9 +584,9 @@ CREATE UNIQUE INDEX name ON table - An index column need not be just a column of the underlying table, + An index column need not be just a column of an underlying table, but can be a function or scalar expression computed from one or - more columns of the table. This feature is useful to obtain fast + more columns of a table. This feature is useful to obtain fast access to tables based on the results of computations. @@ -595,9 +596,9 @@ CREATE UNIQUE INDEX name ON table SELECT * FROM test1 WHERE lower(col1) = 'value'; - This query can use an index, if one has been + This query can use an index if one has been defined on the result of the lower(col1) - operation: + function: CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1)); @@ -612,7 +613,7 @@ CREATE INDEX test1_lower_col1_idx ON test1 (lower(col1)); - As another example, if one often does queries like this: + As another example, if one often does queries like: SELECT * FROM people WHERE (first_name || ' ' || last_name) = 'John Smith'; @@ -655,7 +656,7 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name)); A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the - partial index). The index contains entries for only those table + partial index). The index contains entries only for those table rows that satisfy the predicate. Partial indexes are a specialized feature, but there are several situations in which they are useful. @@ -665,8 +666,8 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name)); values. Since a query searching for a common value (one that accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point in keeping those rows in the - index at all. This reduces the size of the index, which will speed - up queries that do use the index. It will also speed up many table + index. A partial index reduces the size of the index, which speeds + up queries that use the index. It will also speed up many table update operations because the index does not need to be updated in all cases. shows a possible application of this idea. @@ -700,39 +701,43 @@ CREATE TABLE access_log ( such as this: CREATE INDEX access_log_client_ip_ix ON access_log (client_ip) - WHERE NOT (client_ip > inet '192.168.100.0' AND client_ip < inet '192.168.100.255'); +WHERE NOT (client_ip > inet '192.168.100.0' AND + client_ip < inet '192.168.100.255'); A typical query that can use this index would be: -SELECT * FROM access_log WHERE url = '/index.html' AND client_ip = inet '212.78.10.32'; +SELECT * +FROM access_log +WHERE url = '/index.html' AND client_ip = inet '212.78.10.32'; A query that cannot use this index is: -SELECT * FROM access_log WHERE client_ip = inet '192.168.100.23'; +SELECT * +FROM access_log +WHERE client_ip = inet '192.168.100.23'; Observe that this kind of partial index requires that the common - values be predetermined. If the distribution of values is - inherent (due to the nature of the application) and static (not - changing over time), this is not difficult, but if the common values are - merely due to the coincidental data load this can require a lot of - maintenance work to change the index definition from time to time. + values be predetermined, so such partial indexes are best used for + data distribution that do not change. The indexes can be recreated + occasionally to adjust for new data distributions, but this adds + maintenance overhead. - Another possible use for a partial index is to exclude values from the + Another possible use for partial indexes is to exclude values from the index that the typical query workload is not interested in; this is shown in . This results in the same advantages as listed above, but it prevents the uninteresting values from being accessed via that - index at all, even if an index scan might be profitable in that + index, even if an index scan might be profitable in that case. Obviously, setting up partial indexes for this kind of scenario will require a lot of care and experimentation. @@ -774,7 +779,7 @@ SELECT * FROM orders WHERE billed is not true AND amount > 5000.00; SELECT * FROM orders WHERE order_nr = 3501; - The order 3501 might be among the billed or among the unbilled + The order 3501 might be among the billed or unbilled orders. @@ -799,9 +804,9 @@ SELECT * FROM orders WHERE order_nr = 3501; x < 1 implies x < 2; otherwise the predicate condition must exactly match part of the query's WHERE condition - or the index will not be recognized to be usable. Matching takes + or the index will not be recognized as usable. Matching takes place at query planning time, not at run time. As a result, - parameterized query clauses will not work with a partial index. For + parameterized query clauses do not work with a partial index. For example a prepared query with a parameter might specify x < ? which will never imply x < 2 for all possible values of the parameter. @@ -835,7 +840,7 @@ CREATE TABLE tests ( CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target) WHERE success; - This is a particularly efficient way of doing it when there are few + This is a particularly efficient approach when there are few successful tests and many unsuccessful ones. @@ -859,7 +864,7 @@ CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target) know when an index might be profitable. Forming this knowledge requires experience and understanding of how indexes in PostgreSQL work. In most cases, the advantage of a - partial index over a regular index will not be much. + partial index over a regular index will be minimal. @@ -892,7 +897,7 @@ CREATE INDEX name ON table would use the int4_ops class; this operator class includes comparison functions for values of type int4. In practice the default operator class for the column's data type is - usually sufficient. The main point of having operator classes is + usually sufficient. The main reason for having operator classes is that for some data types, there could be more than one meaningful index behavior. For example, we might want to sort a complex-number data type either by absolute value or by real part. We could do this by @@ -931,7 +936,7 @@ CREATE INDEX test_index ON test_table (col varchar_pattern_ops); to use an index. Such queries cannot use the xxx_pattern_ops operator classes. (Ordinary equality comparisons can use these - operator classes, however.) It is allowed to create multiple + operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes. If you do use the C locale, you do not need the xxx_pattern_ops @@ -990,7 +995,7 @@ SELECT am.amname AS index_method, Although indexes in PostgreSQL do not need - maintenance and tuning, it is still important to check + maintenance or tuning, it is still important to check which indexes are actually used by the real-life query workload. Examining index usage for an individual query is done with the @@ -1002,10 +1007,10 @@ SELECT am.amname AS index_method, It is difficult to formulate a general procedure for determining - which indexes to set up. There are a number of typical cases that + which indexes to create. There are a number of typical cases that have been shown in the examples throughout the previous sections. - A good deal of experimentation will be necessary in most cases. - The rest of this section gives some tips for that. + A good deal of experimentation is often necessary. + The rest of this section gives some tips for that: @@ -1014,7 +1019,7 @@ SELECT am.amname AS index_method, Always run first. This command collects statistics about the distribution of the values in the - table. This information is required to guess the number of rows + table. This information is required to estimate the number of rows returned by a query, which is needed by the planner to assign realistic costs to each possible query plan. In absence of any real statistics, some default values are assumed, which are @@ -1035,13 +1040,13 @@ SELECT am.amname AS index_method, It is especially fatal to use very small test data sets. While selecting 1000 out of 100000 rows could be a candidate for an index, selecting 1 out of 100 rows will hardly be, because the - 100 rows will probably fit within a single disk page, and there + 100 rows probably fit within a single disk page, and there is no plan that can beat sequentially fetching 1 disk page. Also be careful when making up test data, which is often - unavoidable when the application is not in production use yet. + unavoidable when the application is not yet in production. Values that are very similar, completely random, or inserted in sorted order will skew the statistics away from the distribution that real data would have. @@ -1058,7 +1063,7 @@ SELECT am.amname AS index_method, (enable_nestloop), which are the most basic plans, will force the system to use a different plan. If the system still chooses a sequential scan or nested-loop join then there is - probably a more fundamental reason why the index is not + probably a more fundamental reason why the index is not being used; for example, the query condition does not match the index. (What kind of query can use what kind of index is explained in the previous sections.) diff --git a/doc/src/sgml/info.sgml b/doc/src/sgml/info.sgml index a33bbc9610..6e358e7719 100644 --- a/doc/src/sgml/info.sgml +++ b/doc/src/sgml/info.sgml @@ -1,4 +1,4 @@ - + Further Information @@ -8,12 +8,17 @@ resources about PostgreSQL: + - FAQs + Wiki - The FAQ list FAQ contains - continuously updated answers to frequently asked questions. + The PostgreSQL wiki contains the project's FAQ + (Frequently Asked Questions) list, TODO list, and + detailed information about many more topics. diff --git a/doc/src/sgml/install-win32.sgml b/doc/src/sgml/install-win32.sgml index b5789f5758..6235a1d382 100644 --- a/doc/src/sgml/install-win32.sgml +++ b/doc/src/sgml/install-win32.sgml @@ -1,4 +1,4 @@ - + Installation from Source Code on <productname>Windows</productname> @@ -383,7 +383,7 @@ To build the libpq client library using Visual Studio 7.1 or later, change into the - src directory and type the command + src directory and type the command: nmake /f win32.mak @@ -392,7 +392,7 @@ To build a 64-bit version of the libpq client library using Visual Studio 8.0 or later, change into the src - directory and type in the command + directory and type in the command: nmake /f win32.mak CPU=AMD64 @@ -403,7 +403,7 @@ To build the libpq client library using Borland C++, change into the - src directory and type the command + src directory and type the command: make -N -DCFG=Release /f bcc32.mak diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 9faa6f67ba..e91f34538a 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -1,4 +1,4 @@ - + <![%standalone-include[<productname>PostgreSQL</>]]> @@ -11,7 +11,7 @@ <para> This <![%standalone-include;[document]]> <![%standalone-ignore;[chapter]]> describes the installation of - <productname>PostgreSQL</productname> from the source code + <productname>PostgreSQL</productname> using the source code distribution. (If you are installing a pre-packaged distribution, such as an RPM or Debian package, ignore this <![%standalone-include;[document]]> @@ -75,7 +75,7 @@ su - postgres refer to it by that name. (On some systems <acronym>GNU</acronym> <application>make</> is the default tool with the name <filename>make</>.) To test for <acronym>GNU</acronym> - <application>make</application> enter + <application>make</application> enter: <screen> <userinput>gmake --version</userinput> </screen> @@ -85,9 +85,10 @@ su - postgres <listitem> <para> - You need an <acronym>ISO</>/<acronym>ANSI</> C compiler. Recent + You need an <acronym>ISO</>/<acronym>ANSI</> C compiler (minimum + C89-compliant). Recent versions of <productname>GCC</> are recommendable, but - <productname>PostgreSQL</> is known to build with a wide variety + <productname>PostgreSQL</> is known to build using a wide variety of compilers from different vendors. </para> </listitem> @@ -95,7 +96,7 @@ su - postgres <listitem> <para> <application>tar</> is required to unpack the source - distribution in the first place, in addition to either + distribution, in addition to either <application>gzip</> or <application>bzip2</>. In addition, <application>gzip</> is required to install the documentation. @@ -117,7 +118,7 @@ su - postgres command you type, and allows you to use arrow keys to recall and edit previous commands. This is very helpful and is strongly recommended. If you don't want to use it then you must specify - the <option>--without-readline</option> option for + the <option>--without-readline</option> option of <filename>configure</>. As an alternative, you can often use the BSD-licensed <filename>libedit</filename> library, originally developed on <productname>NetBSD</productname>. The @@ -140,7 +141,7 @@ su - postgres The <productname>zlib</productname> compression library will be used by default. If you don't want to use it then you must - specify the <option>--without-zlib</option> option for + specify the <option>--without-zlib</option> option to <filename>configure</filename>. Using this option disables support for compressed archives in <application>pg_dump</> and <application>pg_restore</>. @@ -152,7 +153,7 @@ su - postgres <para> The following packages are optional. They are not required in the default configuration, but they are needed when certain build - options are enabled, as explained below. + options are enabled, as explained below: <itemizedlist> <listitem> @@ -172,7 +173,8 @@ su - postgres <para> If you don't have the shared library but you need one, a message - like this will appear during the build to point out this fact: + like this will appear during the <productname>PostgreSQL</> + build to point out this fact: <screen> *** Cannot build PL/Perl because libperl is not a shared library. *** You might have to rebuild your Perl installation. Refer to @@ -206,7 +208,7 @@ su - postgres <filename>libpython</filename> library must be a shared library also on most platforms. This is not the case in a default <productname>Python</productname> installation. If after - building and installing you have a file called + building and installing <productname>PostgreSQL</> you have a file called <filename>plpython.so</filename> (possibly a different extension), then everything went well. Otherwise you should have seen a notice like this flying by: @@ -216,7 +218,7 @@ su - postgres *** the documentation for details. </screen> That means you have to rebuild (part of) your - <productname>Python</productname> installation to supply this + <productname>Python</productname> installation to create this shared library. </para> @@ -272,7 +274,7 @@ su - postgres <para> If you are building from a <acronym>CVS</acronym> tree instead of - using a released source package, or if you want to do development, + using a released source package, or if you want to do server development, you also need the following packages: <itemizedlist> @@ -314,7 +316,7 @@ su - postgres Also check that you have sufficient disk space. You will need about 65 MB for the source tree during compilation and about 15 MB for the installation directory. An empty database cluster takes about - 25 MB, databases take about five times the amount of space that a + 25 MB; databases take about five times the amount of space that a flat text file with the same data would take. If you are going to run the regression tests you will temporarily need up to an extra 90 MB. Use the <command>df</command> command to check free disk @@ -420,7 +422,7 @@ su - postgres On systems that have <productname>PostgreSQL</> started at boot time, there is probably a start-up file that will accomplish the same thing. For example, on a <systemitem class="osname">Red Hat Linux</> system one - might find that + might find that: <screen> <userinput>/etc/rc.d/init.d/postgresql stop</userinput> </screen> @@ -469,7 +471,7 @@ su - postgres <step> <para> - Start the database server, again from the special database user + Start the database server, again the special database user account: <programlisting> <userinput>/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data</> @@ -479,7 +481,7 @@ su - postgres <step> <para> - Finally, restore your data from backup with + Finally, restore your data from backup with: <screen> <userinput>/usr/local/pgsql/bin/psql -d postgres -f <replaceable>outputfile</></userinput> </screen> @@ -514,12 +516,12 @@ su - postgres The first step of the installation procedure is to configure the source tree for your system and choose the options you would like. This is done by running the <filename>configure</> script. For a - default installation simply enter + default installation simply enter: <screen> <userinput>./configure</userinput> </screen> - This script will run a number of tests to guess values for various - system dependent variables and detect some quirks of your + This script will run a number of tests to determine values for various + system dependent variables and detect any quirks of your operating system, and finally will create several files in the build tree to record what it found. (You can also run <filename>configure</filename> in a directory outside the source @@ -719,7 +721,7 @@ su - postgres internal header files and the server header files are installed into private directories under <varname>includedir</varname>. See the documentation of each interface for information about how to - get at the its header files. Finally, a private subdirectory will + access its header files. Finally, a private subdirectory will also be created, if appropriate, under <varname>libdir</varname> for dynamically loadable modules. </para> @@ -769,7 +771,7 @@ su - postgres Enables Native Language Support (<acronym>NLS</acronym>), that is, the ability to display a program's messages in a language other than English. - <replaceable>LANGUAGES</replaceable> is a space-separated + <replaceable>LANGUAGES</replaceable> is an optional space-separated list of codes of the languages that you want supported, for example <literal>--enable-nls='de fr'</>. (The intersection between your list and the set of actually provided @@ -927,11 +929,11 @@ su - postgres and libpq]]><![%standalone-ignore[<xref linkend="libpq-ldap"> and <xref linkend="auth-ldap">]]> for more information). On Unix, this requires the <productname>OpenLDAP</> package to be - installed. <filename>configure</> will check for the required + installed. On Windows, the default <productname>WinLDAP</> + library is used. <filename>configure</> will check for the required header files and libraries to make sure that your <productname>OpenLDAP</> installation is sufficient before - proceeding. On Windows, the default <productname>WinLDAP</> - library is used. + proceeding. </para> </listitem> </varlistentry> @@ -1225,7 +1227,7 @@ su - postgres <listitem> <para> Compiles all programs and libraries with debugging symbols. - This means that you can run the programs through a debugger + This means that you can run the programs in a debugger to analyze problems. This enlarges the size of the installed executables considerably, and on non-GCC compilers it usually also disables compiler optimization, causing slowdowns. However, @@ -1293,7 +1295,7 @@ su - postgres be rebuilt when any header file is changed. This is useful if you are doing development work, but is just wasted overhead if you intend only to compile once and install. At present, - this option will work only if you use GCC. + this option only works with GCC. </para> </listitem> </varlistentry> @@ -1510,13 +1512,13 @@ su - postgres <title>Build - To start the build, type + To start the build, type: gmake (Remember to use GNU make.) The build will take a few minutes depending on your - hardware. The last line displayed should be + hardware. The last line displayed should be: All of PostgreSQL is successfully made. Ready to install. @@ -1535,7 +1537,7 @@ All of PostgreSQL is successfully made. Ready to install. you can run the regression tests at this point. The regression tests are a test suite to verify that PostgreSQL runs on your machine in the way the developers expected it - to. Type + to. Type: gmake check @@ -1550,7 +1552,7 @@ All of PostgreSQL is successfully made. Ready to install.
- Installing The Files + Installing the Files @@ -1562,14 +1564,14 @@ All of PostgreSQL is successfully made. Ready to install. - To install PostgreSQL enter + To install PostgreSQL enter: gmake install This will install files into the directories that were specified in . Make sure that you have appropriate permissions to write into that area. Normally you need to do this - step as root. Alternatively, you could create the target + step as root. Alternatively, you can create the target directories in advance and arrange for appropriate permissions to be granted. @@ -1639,14 +1641,14 @@ All of PostgreSQL is successfully made. Ready to install. Cleaning: - After the installation you can make room by removing the built + After the installation you can free disk space by removing the built files from the source tree with the command gmake clean. This will preserve the files made by the configure program, so that you can rebuild everything with gmake later on. To reset the source tree to the state in which it was distributed, use gmake distclean. If you are going to build for several platforms within the same source tree you must do - this and re-configure for each build. (Alternatively, use + this and rebuild for each platform. (Alternatively, use a separate build tree for each platform, so that the source tree remains unmodified.) @@ -1673,8 +1675,8 @@ All of PostgreSQL is successfully made. Ready to install. - On some systems that have shared libraries (which most systems do) - you need to tell your system how to find the newly installed + On several systems with shared libraries + you need to tell the system how to find the newly installed shared libraries. The systems on which this is not necessary include BSD/OS, FreeBSD, @@ -1688,7 +1690,7 @@ All of PostgreSQL is successfully made. Ready to install. The method to set the shared library search path varies between - platforms, but the most widely usable method is to set the + platforms, but the most widely-used method is to set the environment variable LD_LIBRARY_PATH like so: In Bourne shells (sh, ksh, bash, zsh): @@ -1724,7 +1726,7 @@ setenv LD_LIBRARY_PATH /usr/local/pgsql/lib If in doubt, refer to the manual pages of your system (perhaps ld.so or rld). If you later - on get a message like + get a message like: psql: error in loading shared libraries libpq.so.2.1: cannot open shared object file: No such file or directory @@ -1776,7 +1778,7 @@ libpq.so.2.1: cannot open shared object file: No such file or directory To do this, add the following to your shell start-up file, such as ~/.bash_profile (or /etc/profile, if you - want it to affect every user): + want it to affect all users): PATH=/usr/local/pgsql/bin:$PATH export PATH @@ -1807,7 +1809,7 @@ export MANPATH server, overriding the compiled-in defaults. If you are going to run client applications remotely then it is convenient if every user that plans to use the database sets PGHOST. This - is not required, however: the settings can be communicated via command + is not required, however; the settings can be communicated via command line options to most client programs. @@ -1902,7 +1904,7 @@ kill `cat /usr/local/pgsql/data/postmaster.pid` createdb testdb - Then enter + Then enter: psql testdb @@ -2950,7 +2952,7 @@ LIBOBJS = snprintf.o If you see the linking of the postgres executable abort with an - error message like + error message like: Undefined first referenced symbol in file diff --git a/doc/src/sgml/intro.sgml b/doc/src/sgml/intro.sgml index ccc3c8d772..f94b8fb816 100644 --- a/doc/src/sgml/intro.sgml +++ b/doc/src/sgml/intro.sgml @@ -1,11 +1,11 @@ - + Preface This book is the official documentation of - PostgreSQL. It is being written by the + PostgreSQL. It has been written by the PostgreSQL developers and other volunteers in parallel to the development of the PostgreSQL software. It describes all @@ -58,7 +58,7 @@ contains information for advanced users about the extensibility capabilities of the - server. Topics are, for instance, user-defined data types and + server. Topics include user-defined data types and functions. @@ -148,7 +148,7 @@ And because of the liberal license, PostgreSQL can be used, modified, and - distributed by everyone free of charge for any purpose, be it + distributed by anyone free of charge for any purpose, be it private, commercial, or academic. diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 9334cbede2..44b117e4cd 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1,4 +1,4 @@ - + <application>libpq</application> - C Library @@ -6633,7 +6633,7 @@ myEventProc(PGEventId evtId, void *evtInfo, void *passThrough) #include <libpq-fe.h> If you failed to do that then you will normally get error messages - from your compiler similar to + from your compiler similar to: foo.c: In function `main': foo.c:34: `PGconn' undeclared (first use in this function) @@ -6679,7 +6679,7 @@ CPPFLAGS += -I/usr/local/pgsql/include Failure to specify the correct option to the compiler will - result in an error message such as + result in an error message such as: testlibpq.c:8:22: libpq-fe.h: No such file or directory @@ -6713,7 +6713,7 @@ cc -o testprog testprog1.o testprog2.o -L/usr/local/pgsql/lib -lpq Error messages that point to problems in this area could look like - the following. + the following: testlibpq.o: In function `main': testlibpq.o(.text+0x60): undefined reference to `PQsetdbLogin' diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 8c4f2b7db1..ae36d07832 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1,4 +1,4 @@ - + Monitoring Database Activity @@ -929,7 +929,7 @@ postgres: user database host read() calls issued for the table, index, or database; the number of actual physical reads is usually lower due to kernel-level buffering. The *_blks_read - statistics columns uses this subtraction, i.e. fetched minus hit. + statistics columns uses this subtraction, i.e., fetched minus hit. diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index 43055789be..4637f0ae28 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -1,4 +1,4 @@ - + Concurrency Control @@ -43,7 +43,7 @@ - The main advantage to using the MVCC model of + The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so @@ -246,7 +246,7 @@ committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. In effect, a SELECT query sees - a snapshot of the database as of the instant the query begins to + a snapshot of the database at the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive @@ -260,7 +260,7 @@ FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows - that were committed as of the command start time. However, such a target + that were committed before the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or @@ -296,7 +296,7 @@ COMMIT;
If two such transactions concurrently try to change the balance of account - 12345, we clearly want the second transaction to start from the updated + 12345, we clearly want the second transaction to start with the updated version of the account's row. Because each command is affecting only a predetermined row, letting it see the updated version of the row does not create any troublesome inconsistency. @@ -306,7 +306,7 @@ COMMIT; More complex usage can produce undesirable results in Read Committed mode. For example, consider a DELETE command operating on data that is being both added and removed from its - restriction criteria by another command, e.g. assume + restriction criteria by another command, e.g., assume website is a two-row table with website.hits equaling 9 and 10: @@ -354,7 +354,7 @@ COMMIT; - The level Serializable provides the strictest transaction + The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, applications using this level must @@ -362,19 +362,21 @@ COMMIT; - When a transaction is on the serializable level, - a SELECT query sees only data committed before the + When a transaction is using the serializable level, + a SELECT query only sees data committed before the transaction began; it never sees either uncommitted data or changes committed - during transaction execution by concurrent transactions. (However, the + during transaction execution by concurrent transactions. (However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet - committed.) This is different from Read Committed in that the - SELECT - sees a snapshot as of the start of the transaction, not as of the start + committed.) This is different from Read Committed in that + SELECT in a serializable transaction + sees a snapshot as of the start of the transaction, not as of the start of the current query within the transaction. Thus, successive - SELECT commands within a single transaction always see the same - data. + SELECT commands within a single + transaction see the same data, i.e. they never see changes made by + transactions that committed after its own transaction started. (This + behavior can be ideal for reporting applications.) @@ -382,7 +384,7 @@ COMMIT; FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows - that were committed as of the transaction start time. However, such a + that were committed before the transaction start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the @@ -402,9 +404,9 @@ ERROR: could not serialize access due to concurrent update - When the application receives this error message, it should abort - the current transaction and then retry the whole transaction from - the beginning. The second time through, the transaction sees the + When an application receives this error message, it should abort + the current transaction and retry the whole transaction from + the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update. @@ -420,8 +422,8 @@ ERROR: could not serialize access due to concurrent update transaction sees a wholly consistent view of the database. However, the application has to be prepared to retry transactions when concurrent updates make it impossible to sustain the illusion of serial execution. - Since the cost of redoing complex transactions might be significant, - this mode is recommended only when updating transactions contain logic + Since the cost of redoing complex transactions can be significant, + serializable mode is recommended only when updating transactions contain logic sufficiently complex that they might give wrong answers in Read Committed mode. Most commonly, Serializable mode is necessary when a transaction executes several successive commands that must see @@ -449,7 +451,7 @@ ERROR: could not serialize access due to concurrent update is not sufficient to guarantee true serializability, and in fact PostgreSQL's Serializable mode does not guarantee serializable execution in this sense. As an example, - consider a table mytab, initially containing + consider a table mytab, initially containing: class | value -------+------- @@ -458,18 +460,18 @@ ERROR: could not serialize access due to concurrent update 2 | 100 2 | 200 - Suppose that serializable transaction A computes + Suppose that serializable transaction A computes: SELECT SUM(value) FROM mytab WHERE class = 1; and then inserts the result (30) as the value in a - new row with class = 2. Concurrently, serializable - transaction B computes + new row with class = 2. Concurrently, serializable + transaction B computes: SELECT SUM(value) FROM mytab WHERE class = 2; and obtains the result 300, which it inserts in a new row with - class = 1. Then both transactions commit. None of + class = 1. Then both transactions commit. None of the listed undesirable behaviors have occurred, yet we have a result that could not have occurred in either order serially. If A had executed before B, B would have computed the sum 330, not 300, and @@ -505,7 +507,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; - In those cases where the possibility of nonserializable execution + In cases where the possibility of non-serializable execution is a real hazard, problems can be prevented by appropriate use of explicit locking. Further discussion appears in the following sections. @@ -588,7 +590,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; The SELECT command acquires a lock of this mode on - referenced tables. In general, any query that only reads a table + referenced tables. In general, any query that only reads a table and does not modify it will acquire this lock mode. @@ -632,7 +634,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; acquire this lock mode on the target table (in addition to ACCESS SHARE locks on any other referenced tables). In general, this lock mode will be acquired by any - command that modifies the data in a table. + command that modifies data in a table. @@ -664,10 +666,9 @@ SELECT SUM(value) FROM mytab WHERE class = 2; - Conflicts with the ROW EXCLUSIVE, - SHARE UPDATE EXCLUSIVE, SHARE ROW - EXCLUSIVE, EXCLUSIVE, and - ACCESS EXCLUSIVE lock modes. + Conflicts all lock modes except ACCESS SHARE, + ROW SHARE, and SHARE (it + does not conflict with itself). This mode protects a table against concurrent data changes. @@ -684,11 +685,8 @@ SELECT SUM(value) FROM mytab WHERE class = 2; - Conflicts with the ROW EXCLUSIVE, - SHARE UPDATE EXCLUSIVE, - SHARE, SHARE ROW - EXCLUSIVE, EXCLUSIVE, and - ACCESS EXCLUSIVE lock modes. + Conflicts all lock modes except ACCESS SHARE + and ROW SHARE. @@ -704,11 +702,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; - Conflicts with the ROW SHARE, ROW - EXCLUSIVE, SHARE UPDATE - EXCLUSIVE, SHARE, SHARE - ROW EXCLUSIVE, EXCLUSIVE, and - ACCESS EXCLUSIVE lock modes. + Conflicts all lock modes except ACCESS SHARE. This mode allows only concurrent ACCESS SHARE locks, i.e., only reads from the table can proceed in parallel with a transaction holding this lock mode. @@ -717,7 +711,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; This lock mode is not automatically acquired on user tables by any PostgreSQL command. However it is - acquired on certain system catalogs in some operations. + acquired during certain internal system catalogs operations. @@ -728,12 +722,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; - Conflicts with locks of all modes (ACCESS - SHARE, ROW SHARE, ROW - EXCLUSIVE, SHARE UPDATE - EXCLUSIVE, SHARE, SHARE - ROW EXCLUSIVE, EXCLUSIVE, and - ACCESS EXCLUSIVE). + Conflicts with all lock modes. This mode guarantees that the holder is the only transaction accessing the table in any way. @@ -760,7 +749,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; Once acquired, a lock is normally held till end of transaction. But if a lock is acquired after establishing a savepoint, the lock is released - immediately if the savepoint is rolled back to. This is consistent with + immediately if the savepoint is rolled back. This is consistent with the principle that ROLLBACK cancels all effects of the commands since the savepoint. The same holds for locks acquired within a PL/pgSQL exception block: an error escape from the block @@ -893,9 +882,9 @@ SELECT SUM(value) FROM mytab WHERE class = 2; can be exclusive or shared locks. An exclusive row-level lock on a specific row is automatically acquired when the row is updated or deleted. The lock is held until the transaction commits or rolls - back, in just the same way as for table-level locks. Row-level locks do - not affect data querying; they block writers to the same - row only. + back, like table-level locks. Row-level locks do + not affect data querying; they only block writers to the same + row. @@ -917,10 +906,10 @@ SELECT SUM(value) FROM mytab WHERE class = 2; PostgreSQL doesn't remember any - information about modified rows in memory, so it has no limit to + information about modified rows in memory, so there is no limit on the number of rows locked at one time. However, locking a row - might cause a disk write; thus, for example, SELECT FOR - UPDATE will modify selected rows to mark them locked, and so + might cause a disk write, e.g., SELECT FOR + UPDATE modifies selected rows to mark them locked, and so will result in disk writes. @@ -929,7 +918,7 @@ SELECT SUM(value) FROM mytab WHERE class = 2; used to control read/write access to table pages in the shared buffer pool. These locks are released immediately after a row is fetched or updated. Application developers normally need not be concerned with - page-level locks, but we mention them for completeness. + page-level locks, but they are mentioned for completeness. @@ -953,14 +942,14 @@ SELECT SUM(value) FROM mytab WHERE class = 2; deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to - predict and should not be relied on.) + predict and should not be relied upon.) Note that deadlocks can also occur as the result of row-level locks (and thus, they can occur even if explicit locking is not - used). Consider the case in which there are two concurrent - transactions modifying a table. The first transaction executes: + used). Consider the case in which two concurrent + transactions modify a table. The first transaction executes: UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111; @@ -1003,10 +992,10 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222; above, if both transactions had updated the rows in the same order, no deadlock would have occurred. One should also ensure that the first lock acquired on - an object in a transaction is the highest mode that will be + an object in a transaction is the most restrictive mode that will be needed for that object. If it is not feasible to verify this in advance, then deadlocks can be handled on-the-fly by retrying - transactions that are aborted due to deadlock. + transactions that abort due to deadlocks. @@ -1055,7 +1044,7 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222; and . Care must be taken not to exhaust this - memory or the server will not be able to grant any locks at all. + memory or the server will be unable to grant any locks at all. This imposes an upper limit on the number of advisory locks grantable by the server, typically in the tens to hundreds of thousands depending on how the server is configured. @@ -1068,7 +1057,7 @@ UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222; While a flag stored in a table could be used for the same purpose, advisory locks are faster, avoid MVCC bloat, and are automatically cleaned up by the server at the end of the session. - In certain cases using this method, especially in queries + In certain cases using this advisory locking method, especially in queries involving explicit ordering and LIMIT clauses, care must be taken to control the locks acquired because of the order in which SQL expressions are evaluated. For example: @@ -1109,9 +1098,9 @@ SELECT pg_advisory_lock(q.id) FROM if a row is returned by SELECT it doesn't mean that the row is still current at the instant it is returned (i.e., sometime after the current query began). The row might have been modified or - deleted by an already-committed transaction that committed after this one - started. - Even if the row is still valid now, it could be changed or + deleted by an already-committed transaction that committed after + the SELECT started. + Even if the row is still valid now, it could be changed or deleted before the current transaction does a commit or rollback. @@ -1132,7 +1121,7 @@ SELECT pg_advisory_lock(q.id) FROM concurrent updates one must use SELECT FOR UPDATE, SELECT FOR SHARE, or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE - or SELECT FOR SHARE locks just the + or SELECT FOR SHARE lock just the returned rows against concurrent updates, while LOCK TABLE locks the whole table.) This should be taken into account when porting applications to @@ -1144,10 +1133,10 @@ SELECT pg_advisory_lock(q.id) FROM For example, a banking application might wish to check that the sum of all credits in one table equals the sum of debits in another table, when both tables are being actively updated. Comparing the results of two - successive SELECT sum(...) commands will not work reliably under + successive SELECT sum(...) commands will not work reliably in Read Committed mode, since the second query will likely include the results of transactions not counted by the first. Doing the two sums in a - single serializable transaction will give an accurate picture of the + single serializable transaction will give an accurate picture of only the effects of transactions that committed before the serializable transaction started — but one might legitimately wonder whether the answer is still relevant by the time it is delivered. If the serializable transaction @@ -1164,8 +1153,8 @@ SELECT pg_advisory_lock(q.id) FROM Note also that if one is relying on explicit locking to prevent concurrent changes, one should use - Read Committed mode, or in Serializable mode be careful to obtain the - lock(s) before performing queries. A lock obtained by a + either Read Committed mode, or in Serializable mode be careful to obtain + locks before performing queries. A lock obtained by a serializable transaction guarantees that no other transactions modifying the table are still running, but if the snapshot seen by the transaction predates obtaining the lock, it might predate some now-committed @@ -1173,7 +1162,7 @@ SELECT pg_advisory_lock(q.id) FROM frozen at the start of its first query or data-modification command (SELECT, INSERT, UPDATE, or DELETE), so - it's possible to obtain locks explicitly before the snapshot is + it is often desirable to obtain locks explicitly before the snapshot is frozen. @@ -1189,7 +1178,7 @@ SELECT pg_advisory_lock(q.id) FROM Though PostgreSQL provides nonblocking read/write access to table - data, nonblocking read/write access is not currently offered for every + data, nonblocking read/write access is currently not offered for every index access method implemented in PostgreSQL. The various index types are handled as follows: @@ -1232,8 +1221,8 @@ SELECT pg_advisory_lock(q.id) FROM Short-term share/exclusive page-level locks are used for read/write access. Locks are released immediately after each - index row is fetched or inserted. But note that a GIN-indexed - value insertion usually produces several index key insertions + index row is fetched or inserted. But note insertion of a GIN-indexed + value usually produces several index key insertions per row, so GIN might do substantial work for a single value's insertion. diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index aea529552c..8744a5cb31 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1,4 +1,4 @@ - + Performance Tips @@ -9,7 +9,7 @@ Query performance can be affected by many things. Some of these can - be manipulated by the user, while others are fundamental to the underlying + be controlled by the user, while others are fundamental to the underlying design of the system. This chapter provides some hints about understanding and tuning PostgreSQL performance. @@ -27,10 +27,10 @@ PostgreSQL devises a query - plan for each query it is given. Choosing the right + plan for each query it receives. Choosing the right plan to match the query structure and the properties of the data is absolutely critical for good performance, so the system includes - a complex planner that tries to select good plans. + a complex planner that tries to choose good plans. You can use the command to see what query plan the planner creates for any query. @@ -40,14 +40,13 @@ The structure of a query plan is a tree of plan nodes. - Nodes at the bottom level are table scan nodes: they return raw rows + Nodes at the bottom level of the tree are table scan nodes: they return raw rows from a table. There are different types of scan nodes for different table access methods: sequential scans, index scans, and bitmap index scans. If the query requires joining, aggregation, sorting, or other operations on the raw rows, then there will be additional nodes - atop the scan nodes to perform these operations. Again, - there is usually more than one possible way to do these operations, - so different node types can appear here too. The output + above the scan nodes to perform these operations. Other nodes types + are also supported. The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the cost estimates that the planner made for the execution of that plan node. The first line (topmost node) @@ -56,15 +55,15 @@ - Here is a trivial example, just to show what the output looks like. + Here is a trivial example, just to show what the output looks like: Examples in this section are drawn from the regression test database after doing a VACUUM ANALYZE, using 8.2 development sources. You should be able to get similar results if you try the examples yourself, - but your estimated costs and row counts will probably vary slightly + but your estimated costs and row counts might vary slightly because ANALYZE's statistics are random samples rather - than being exact. + than exact. @@ -78,22 +77,23 @@ EXPLAIN SELECT * FROM tenk1; - The numbers that are quoted by EXPLAIN are: + The numbers that are quoted by EXPLAIN are (left + to right): - Estimated start-up cost (Time expended before output scan can start, - e.g., time to do the sorting in a sort node.) + Estimated start-up cost, e.g., time expended before the output scan can start, + time to do the sorting in a sort node - Estimated total cost (If all rows were to be retrieved, though they might - not be: for example, a query with a LIMIT clause will stop - short of paying the total cost of the Limit plan node's - input node.) + Estimated total cost if all rows were to be retrieved (though they might + not be, e.g., a query with a LIMIT clause will stop + short of paying the total cost of the Limit node's + input node) @@ -119,8 +119,8 @@ EXPLAIN SELECT * FROM tenk1; Traditional practice is to measure the costs in units of disk page fetches; that is, is conventionally set to 1.0 and the other cost parameters are set relative - to that. The examples in this section are run with the default cost - parameters. + to that. (The examples in this section are run with the default cost + parameters.) @@ -129,17 +129,18 @@ EXPLAIN SELECT * FROM tenk1; the cost only reflects things that the planner cares about. In particular, the cost does not consider the time spent transmitting result rows to the client, which could be an important - factor in the true elapsed time; but the planner ignores it because + factor in the total elapsed time; but the planner ignores it because it cannot change it by altering the plan. (Every correct plan will output the same row set, we trust.) - Rows output is a little tricky because it is not the + The EXPLAIN rows= value is a little tricky + because it is not the number of rows processed or scanned by the plan node. It is usually less, reflecting the estimated selectivity of any WHERE-clause conditions that are being - applied at the node. Ideally the top-level rows estimate will + applied to the node. Ideally the top-level rows estimate will approximate the number of rows actually returned, updated, or deleted by the query. @@ -163,16 +164,16 @@ EXPLAIN SELECT * FROM tenk1; SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1'; - you will find out that tenk1 has 358 disk - pages and 10000 rows. The estimated cost is (disk pages read * + you will find that tenk1 has 358 disk + pages and 10000 rows. The estimated cost is computed as (disk pages read * ) + (rows scanned * ). By default, - seq_page_cost is 1.0 and cpu_tuple_cost is 0.01. - So the estimated cost is (358 * 1.0) + (10000 * 0.01) = 458. + seq_page_cost is 1.0 and cpu_tuple_cost is 0.01, + so the estimated cost is (358 * 1.0) + (10000 * 0.01) = 458. - Now let's modify the query to add a WHERE condition: + Now let's modify the original query to add a WHERE condition: EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000; @@ -187,7 +188,7 @@ EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000; clause being applied as a filter condition; this means that the plan node checks the condition for each row it scans, and outputs only the ones that pass the condition. - The estimate of output rows has gone down because of the WHERE + The estimate of output rows has been reduced because of the WHERE clause. However, the scan will still have to visit all 10000 rows, so the cost hasn't decreased; in fact it has gone up a bit (by 10000 * - The actual number of rows this query would select is 7000, but the rows + The actual number of rows this query would select is 7000, but the rows= estimate is only approximate. If you try to duplicate this experiment, you will probably get a slightly different estimate; moreover, it will change after each ANALYZE command, because the @@ -224,16 +225,16 @@ EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100; from the table itself. Fetching the rows separately is much more expensive than sequentially reading them, but because not all the pages of the table have to be visited, this is still cheaper than a sequential - scan. (The reason for using two levels of plan is that the upper plan + scan. (The reason for using two plan levels is that the upper plan node sorts the row locations identified by the index into physical order - before reading them, so as to minimize the costs of the separate fetches. + before reading them, to minimize the cost of separate fetches. The bitmap mentioned in the node names is the mechanism that does the sorting.) If the WHERE condition is selective enough, the planner might - switch to a simple index scan plan: + switch to a simple index scan plan: EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 3; @@ -247,8 +248,8 @@ EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 3; In this case the table rows are fetched in index order, which makes them even more expensive to read, but there are so few that the extra cost of sorting the row locations is not worth it. You'll most often see - this plan type for queries that fetch just a single row, and for queries - that request an ORDER BY condition that matches the index + this plan type in queries that fetch just a single row, and for queries + with an ORDER BY condition that matches the index order. @@ -271,11 +272,11 @@ EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 3 AND stringu1 = 'xxx'; cannot be applied as an index condition (since this index is only on the unique1 column). Instead it is applied as a filter on the rows retrieved by the index. Thus the cost has actually gone up - a little bit to reflect this extra checking. + slightly to reflect this extra checking. - If there are indexes on several columns used in WHERE, the + If there are indexes on several columns referenced in WHERE, the planner might choose to use an AND or OR combination of the indexes: @@ -302,7 +303,9 @@ EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000; Let's try joining two tables, using the columns we have been discussing: -EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN -------------------------------------------------------------------------------------- @@ -317,12 +320,12 @@ EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique - In this nested-loop join, the outer scan is the same bitmap index scan we + In this nested-loop join, the outer scan (upper) is the same bitmap index scan we saw earlier, and so its cost and row count are the same because we are applying the WHERE clause unique1 < 100 at that node. The t1.unique2 = t2.unique2 clause is not relevant yet, - so it doesn't affect row count of the outer scan. For the inner scan, the + so it doesn't affect the row count of the outer scan. For the inner (lower) scan, the unique2 value of the current outer-scan row is plugged into the inner index scan to produce an index condition like t2.unique2 = constant. @@ -335,8 +338,8 @@ EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique In this example the join's output row count is the same as the product - of the two scans' row counts, but that's not true in general, because - in general you can have WHERE clauses that mention both tables + of the two scans' row counts, but that's not true in all cases because + you can have WHERE clauses that mention both tables and so can only be applied at the join point, not to either input scan. For example, if we added WHERE ... AND t1.hundred < t2.hundred, @@ -346,14 +349,16 @@ EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique One way to look at variant plans is to force the planner to disregard - whatever strategy it thought was the winner, using the enable/disable + whatever strategy it thought was the cheapest, using the enable/disable flags described in . (This is a crude tool, but useful. See also .) SET enable_nestloop = off; -EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; +EXPLAIN SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ------------------------------------------------------------------------------------------ @@ -370,9 +375,9 @@ EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique This plan proposes to extract the 100 interesting rows of tenk1 using that same old index scan, stash them into an in-memory hash table, and then do a sequential scan of tenk2, probing into the hash table - for possible matches of t1.unique2 = t2.unique2 at each tenk2 row. - The cost to read tenk1 and set up the hash table is entirely start-up - cost for the hash join, since we won't get any rows out until we can + for possible matches of t1.unique2 = t2.unique2 for each tenk2 row. + The cost to read tenk1 and set up the hash table is a start-up + cost for the hash join, since there will be no output until we can start reading tenk2. The total time estimate for the join also includes a hefty charge for the CPU time to probe the hash table 10000 times. Note, however, that we are not charging 10000 times 232.35; @@ -380,14 +385,16 @@ EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique - It is possible to check on the accuracy of the planner's estimated costs + It is possible to check the accuracy of the planner's estimated costs by using EXPLAIN ANALYZE. This command actually executes the query, and then displays the true run time accumulated within each plan node along with the same estimated costs that a plain EXPLAIN shows. For example, we might get a result like this: -EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; +EXPLAIN ANALYZE SELECT * +FROM tenk1 t1, tenk2 t2 +WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- @@ -402,7 +409,7 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t Note that the actual time values are in milliseconds of - real time, whereas the cost estimates are expressed in + real time, whereas the cost= estimates are expressed in arbitrary units; so they are unlikely to match up. The thing to pay attention to is whether the ratios of actual time and estimated costs are consistent. @@ -412,11 +419,11 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t In some query plans, it is possible for a subplan node to be executed more than once. For example, the inner index scan is executed once per outer row in the above nested-loop plan. In such cases, the - loops value reports the + loops= value reports the total number of executions of the node, and the actual time and rows values shown are averages per-execution. This is done to make the numbers comparable with the way that the cost estimates are shown. Multiply by - the loops value to get the total time actually spent in + the loops= value to get the total time actually spent in the node. @@ -429,9 +436,9 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t reported for the top-level plan node. For INSERT, UPDATE, and DELETE commands, the total run time might be considerably larger, because it includes the time spent processing - the result rows. In these commands, the time for the top plan node - essentially is the time spent computing the new rows and/or locating the - old ones, but it doesn't include the time spent applying the changes. + the result rows. For these commands, the time for the top plan node is + essentially the time spent locating the old rows and/or computing + the new ones, but it doesn't include the time spent applying the changes. Time spent firing triggers, if any, is also outside the top plan node, and is shown separately for each trigger. @@ -475,7 +482,9 @@ EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t queries similar to this one: -SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'tenk1%'; +SELECT relname, relkind, reltuples, relpages +FROM pg_class +WHERE relname LIKE 'tenk1%'; relname | relkind | reltuples | relpages ----------------------+---------+-----------+---------- @@ -512,7 +521,7 @@ SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 't Most queries retrieve only a fraction of the rows in a table, due - to having WHERE clauses that restrict the rows to be + to WHERE clauses that restrict the rows to be examined. The planner thus needs to make an estimate of the selectivity of WHERE clauses, that is, the fraction of rows that match each condition in the @@ -544,7 +553,9 @@ SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 't For example, we might do: -SELECT attname, n_distinct, most_common_vals FROM pg_stats WHERE tablename = 'road'; +SELECT attname, n_distinct, most_common_vals +FROM pg_stats +WHERE tablename = 'road'; attname | n_distinct | most_common_vals ---------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- @@ -769,7 +780,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; - Turn off autocommit and just do one commit at the end. (In plain + When doing INSERTs, turn off autocommit and just do + one commit at the end. (In plain SQL, this means issuing BEGIN at the start and COMMIT at the end. Some client libraries might do this behind your back, in which case you need to make sure the @@ -812,7 +824,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; Note that loading a large number of rows using COPY is almost always faster than using - INSERT, even if PREPARE is used and + INSERT, even if the PREPARE ... INSERT is used and multiple insertions are batched into a single transaction. @@ -823,7 +835,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; needs to be written, because in case of an error, the files containing the newly loaded data will be removed anyway. However, this consideration does not apply when - is set, as all commands + is on, as all commands must write WAL in that case. @@ -833,7 +845,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; Remove Indexes - If you are loading a freshly created table, the fastest way is to + If you are loading a freshly created table, the fastest method is to create the table, bulk load the table's data using COPY, then create any indexes needed for the table. Creating an index on pre-existing data is quicker than @@ -844,8 +856,8 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; If you are adding large amounts of data to an existing table, it might be a win to drop the index, load the table, and then recreate the index. Of course, the - database performance for other users might be adversely affected - during the time that the index is missing. One should also think + database performance for other users might suffer + during the time the index is missing. One should also think twice before dropping unique indexes, since the error checking afforded by the unique constraint will be lost while the index is missing. diff --git a/doc/src/sgml/pgbuffercache.sgml b/doc/src/sgml/pgbuffercache.sgml index ff8a381322..ef659ae12a 100644 --- a/doc/src/sgml/pgbuffercache.sgml +++ b/doc/src/sgml/pgbuffercache.sgml @@ -1,4 +1,4 @@ - + pg_buffercache @@ -141,7 +141,8 @@ b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) GROUP BY c.relname - ORDER BY 2 DESC LIMIT 10; + ORDER BY 2 DESC + LIMIT 10; relname | buffers ---------------------------------+--------- tenk2 | 345 diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 041bcfa732..296ad5bb94 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -1,4 +1,4 @@ - + . + should see . @@ -127,14 +127,14 @@ self-contained and can be read individually as desired. The information in this part is presented in a narrative fashion in topical units. Readers looking for a complete description of a - particular command should look into . + particular command should see . - The first few chapters are written so that they can be understood - without prerequisite knowledge, so that new users who need to set + The first few chapters are written so they can be understood + without prerequisite knowledge, so new users who need to set up their own server can begin their exploration with this part. - The rest of this part is about tuning and management; that material + The rest of this part is about tuning and management; the material assumes that the reader is familiar with the general use of the PostgreSQL database system. Readers are encouraged to look at and + Bug Reporting Guidelines @@ -136,7 +136,7 @@ file that can be run through the psql frontend that shows the problem. (Be sure to not have anything in your ~/.psqlrc start-up file.) An easy - start at this file is to use pg_dump + way to create this file is to use pg_dump to dump out the table declarations and data needed to set the scene, then add the problem query. You are encouraged to minimize the size of your example, but this is not absolutely @@ -252,7 +252,7 @@ C library, processor, memory information, and so on. In most cases it is sufficient to report the vendor and version, but do not assume everyone knows what exactly Debian - contains or that everyone runs on Pentiums. If you have + contains or that everyone runs on i386s. If you have installation problems then information about the toolchain on your machine (compiler, make, and so on) is also necessary. diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index 71a33fff66..2fc3b92f8d 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -1,4 +1,4 @@ - + Queries @@ -14,7 +14,7 @@ The previous chapters explained how to create tables, how to fill them with data, and how to manipulate that data. Now we finally - discuss how to retrieve the data out of the database. + discuss how to retrieve the data from the database. @@ -63,7 +63,7 @@ SELECT a, b + c FROM table1; - FROM table1 is a particularly simple kind of + FROM table1 is a simple kind of table expression: it reads just one table. In general, table expressions can be complex constructs of base tables, joins, and subqueries. But you can also omit the table expression entirely and @@ -133,8 +133,8 @@ FROM table_reference , table_r When a table reference names a table that is the parent of a - table inheritance hierarchy, the table reference produces rows of - not only that table but all of its descendant tables, unless the + table inheritance hierarchy, the table reference produces rows + not only of that table but all of its descendant tables, unless the key word ONLY precedes the table name. However, the reference produces only the columns that appear in the named table — any columns added in subtables are ignored. @@ -174,11 +174,12 @@ FROM table_reference , table_r - For each combination of rows from + Produce every possible combination of rows from T1 and - T2, the derived table will contain a - row consisting of all columns in T1 - followed by all columns in T2. If + T2 (i.e., a Cartesian product), + with output columns consisting of + all T1 columns + followed by all T2 columns. If the tables have N and M rows respectively, the joined table will have N * M rows. @@ -242,14 +243,15 @@ FROM table_reference , table_r comma-separated list of column names, which the joined tables must have in common, and forms a join condition specifying equality of each of these pairs of columns. Furthermore, the - output of a JOIN USING has one column for each of - the equated pairs of input columns, followed by all of the + output of JOIN USING has one column for each of + the equated pairs of input columns, followed by the other columns from each table. Thus, USING (a, b, c) is equivalent to ON (t1.a = t2.a AND t1.b = t2.b AND t1.c = t2.c) with the exception that if ON is used there will be two columns a, b, and c in the result, - whereas with USING there will be only one of each. + whereas with USING there will be only one of each + (and they will appear first if SELECT * is used). @@ -262,7 +264,7 @@ FROM table_reference , table_r Finally, NATURAL is a shorthand form of USING: it forms a USING list - consisting of exactly those column names that appear in both + consisting of all column names that appear in both input tables. As with USING, these columns appear only once in the output table. @@ -298,8 +300,8 @@ FROM table_reference , table_r First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in - T2, a joined row is added with null values in columns of - T2. Thus, the joined table unconditionally has at least + T2, a row is added with null values in columns of + T2. Thus, the joined table always has at least one row for each row in T1. @@ -321,9 +323,9 @@ FROM table_reference , table_r First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in - T1, a joined row is added with null values in columns of + T1, a row is added with null values in columns of T1. This is the converse of a left join: the result table - will unconditionally have a row for each row in T2. + will always have a row for each row in T2. @@ -335,9 +337,9 @@ FROM table_reference , table_r First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in - T2, a joined row is added with null values in columns of + T2, a row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the - join condition with any row in T1, a joined row with null + join condition with any row in T1, a row with null values in the columns of T1 is added. @@ -350,8 +352,8 @@ FROM table_reference , table_r Joins of all types can be chained together or nested: either or - both of T1 and - T2 might be joined tables. Parentheses + both T1 and + T2 can be joined tables. Parentheses can be used around JOIN clauses to control the join order. In the absence of parentheses, JOIN clauses nest left-to-right. @@ -460,6 +462,19 @@ FROM table_reference , table_r 3 | c | | (3 rows) + Notice that placing the restriction in the WHERE clause + produces a different result: + +=> SELECT * FROM t1 LEFT JOIN t2 ON t1.num = t2.num WHERE t2.value = 'xxx'; + num | name | num | value +-----+------+-----+------- + 1 | a | 1 | xxx +(1 row) + + This is because a restriction placed in the ON + clause is processed before the join, while + a restriction placed in the WHERE clause is processed + after the join. @@ -513,7 +528,7 @@ SELECT * FROM some_very_long_table_name s JOIN another_fairly_long_name a ON s.i SELECT * FROM my_table AS m WHERE my_table.a > 5; is not valid according to the SQL standard. In - PostgreSQL this will draw an error if the + PostgreSQL this will draw an error, assuming the configuration variable is off (as it is by default). If it is on, an implicit table reference will be added to the @@ -559,8 +574,8 @@ FROM table_reference AS When an alias is applied to the output of a JOIN - clause, using any of these forms, the alias hides the original - names within the JOIN. For example: + clause, the alias hides the original + name referenced in the JOIN. For example: SELECT a.* FROM my_table AS a JOIN your_table AS b ON ... @@ -568,7 +583,7 @@ SELECT a.* FROM my_table AS a JOIN your_table AS b ON ... SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c - is not valid: the table alias a is not visible + is not valid; the table alias a is not visible outside the alias c. @@ -631,7 +646,7 @@ FROM (VALUES ('anne', 'smith'), ('bob', 'jones'), ('joe', 'blow')) If a table function returns a base data type, the single result - column is named like the function. If the function returns a + column name matches the function name. If the function returns a composite type, the result columns get the same names as the individual attributes of the type. @@ -655,8 +670,11 @@ $$ LANGUAGE SQL; SELECT * FROM getfoo(1) AS t1; SELECT * FROM foo - WHERE foosubid IN (select foosubid from getfoo(foo.fooid) z - where z.fooid = foo.fooid); + WHERE foosubid IN ( + SELECT foosubid + FROM getfoo(foo.fooid) z + WHERE z.fooid = foo.fooid + ); CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1); @@ -668,13 +686,14 @@ SELECT * FROM vw_getfoo; In some cases it is useful to define table functions that can return different column sets depending on how they are invoked. To support this, the table function can be declared as returning - the pseudotype record. When such a function is used in + the pseudotype record, rather than SET OF. + When such a function is used in a query, the expected row structure must be specified in the query itself, so that the system can know how to parse and plan the query. Consider this example: SELECT * - FROM dblink('dbname=mydb', 'select proname, prosrc from pg_proc') + FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc') AS t1(proname name, prosrc text) WHERE proname LIKE 'bytea%'; @@ -710,9 +729,9 @@ WHERE search_condition After the processing of the FROM clause is done, each row of the derived virtual table is checked against the search condition. If the result of the condition is true, the row is - kept in the output table, otherwise (that is, if the result is + kept in the output table, otherwise (i.e., if the result is false or null) it is discarded. The search condition typically - references at least some column of the table generated in the + references at least one column of the table generated in the FROM clause; this is not required, but otherwise the WHERE clause will be fairly useless. @@ -735,11 +754,12 @@ FROM a NATURAL JOIN b WHERE b.val > 5 Which one of these you use is mainly a matter of style. The JOIN syntax in the FROM clause is - probably not as portable to other SQL database management systems. For - outer joins there is no choice in any case: they must be done in - the FROM clause. An ON/USING + probably not as portable to other SQL database management systems, + even though it is in the SQL standard. For + outer joins there is no choice: they must be done in + the FROM clause. The ON/USING clause of an outer join is not equivalent to a - WHERE condition, because it determines the addition + WHERE condition, because it affects the addition of rows (for unmatched input rows) as well as the removal of rows from the final result. @@ -760,7 +780,7 @@ SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1) - fdt is the table derived in the + fdt is the table used in the FROM clause. Rows that do not meet the search condition of the WHERE clause are eliminated from fdt. Notice the use of scalar subqueries as @@ -803,11 +823,11 @@ SELECT select_list The is - used to group together those rows in a table that share the same + used to group together those rows in a table that have the same values in all the columns listed. The order in which the columns are listed does not matter. The effect is to combine each set - of rows sharing common values into one group row that is - representative of all rows in the group. This is done to + of rows having common values into one group row that + represents all rows in the group. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups. For instance: @@ -840,7 +860,7 @@ SELECT select_list In general, if a table is grouped, columns that are not - used in the grouping cannot be referenced except in aggregate + the same in the group cannot be referenced except in aggregate expressions. An example with aggregate expressions is: => SELECT x, sum(y) FROM test1 GROUP BY x; @@ -860,7 +880,7 @@ SELECT select_list Grouping without aggregate expressions effectively calculates the - set of distinct values in a column. This can also be achieved + set of distinct values in a column. This can more clearly be achieved using the DISTINCT clause (see ). @@ -868,7 +888,7 @@ SELECT select_list Here is another example: it calculates the total sales for each - product (rather than the total sales on all products): + product (rather than the total sales of all products): SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM products p LEFT JOIN sales s USING (product_id) @@ -877,10 +897,10 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in - the query select list. (Depending on how exactly the products + the query select list. (Depending on how the products table is set up, name and price might be fully dependent on the product ID, so the additional groupings could theoretically be - unnecessary, but this is not implemented yet.) The column + unnecessary, though this is not implemented.) The column s.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales @@ -901,11 +921,11 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales - If a table has been grouped using a GROUP BY - clause, but then only certain groups are of interest, the + If a table has been grouped using GROUP BY, + but only certain groups are of interest, the HAVING clause can be used, much like a - WHERE clause, to eliminate groups from a grouped - table. The syntax is: + WHERE clause, to eliminate groups from the result. + The syntax is: SELECT select_list FROM ... WHERE ... GROUP BY ... HAVING boolean_expression @@ -1068,8 +1088,7 @@ SELECT tbl1.*, tbl2.a FROM ... the row's values substituted for any column references. But the expressions in the select list do not have to reference any columns in the table expression of the FROM clause; - they could be constant arithmetic expressions as well, for - instance. + they can be constant arithmetic expressions as well. @@ -1083,9 +1102,8 @@ SELECT tbl1.*, tbl2.a FROM ... The entries in the select list can be assigned names for further - processing. The further processing in this case is - an optional sort specification and the client application (e.g., - column headers for display). For example: + processing, perhaps for reference in an ORDER BY clause + or for display by the client application. For example: SELECT a AS value, b + c AS sum FROM ... @@ -1122,8 +1140,8 @@ SELECT a "value", b + c AS sum FROM ... The naming of output columns here is different from that done in the FROM clause (see ). This pipeline will in fact - allow you to rename the same column twice, but the name chosen in + linkend="queries-table-aliases">). It is possible + to rename the same column twice, but the name used in the select list is the one that will be passed on. @@ -1181,7 +1199,7 @@ SELECT DISTINCT ON (expression , DISTINCT ON clause is not part of the SQL standard and is sometimes considered bad style because of the potentially indeterminate nature of its results. With judicious use of - GROUP BY and subqueries in FROM the + GROUP BY and subqueries in FROM, this construct can be avoided, but it is often the most convenient alternative. @@ -1229,7 +1247,7 @@ SELECT DISTINCT ON (expression , query1 UNION query2 UNION query3 - which really says + which is executed as: (query1 UNION query2) UNION query3 @@ -1328,9 +1346,9 @@ SELECT a, b FROM table1 ORDER BY a + b, c; The NULLS FIRST and NULLS LAST options can be used to determine whether nulls appear before or after non-null values - in the sort ordering. By default, null values sort as if larger than any - non-null value; that is, NULLS FIRST is the default for - DESC order, and NULLS LAST otherwise. + in the sort ordering. The default behavior is for null values sort as + if larger than all non-null values (NULLS FIRST), except + in DESC ordering, where NULLS LAST is the default. @@ -1341,15 +1359,14 @@ SELECT a, b FROM table1 ORDER BY a + b, c; - For backwards compatibility with the SQL92 version of the standard, - a sort_expression can instead be the name or number + A sort_expression can also be the column label or number of an output column, as in: SELECT a + b AS sum, c FROM table1 ORDER BY sum; SELECT a, max(b) FROM table1 GROUP BY a ORDER BY 1; both of which sort by the first output column. Note that an output - column name has to stand alone, it's not allowed as part of an expression + column name has to stand alone, e.g., it cannot be used in an expression — for example, this is not correct: SELECT a + b AS sum, c FROM table1 ORDER BY sum + c; -- wrong @@ -1412,16 +1429,16 @@ SELECT select_list When using LIMIT, it is important to use an - ORDER BY clause that constrains the result rows into a + ORDER BY clause that constrains the result rows in a unique order. Otherwise you will get an unpredictable subset of the query's rows. You might be asking for the tenth through - twentieth rows, but tenth through twentieth in what ordering? The + twentieth rows, but tenth through twentieth using what ordering? The ordering is unknown, unless you specified ORDER BY. The query optimizer takes LIMIT into account when - generating a query plan, so you are very likely to get different + generating query plans, so you are very likely to get different plans (yielding different row orders) depending on what you give for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select @@ -1455,7 +1472,7 @@ SELECT select_list VALUES ( expression [, ...] ) [, ...] - Each parenthesized list of expressions generates a row in the table. + Each parenthesized list of expressions generates a row in the table expression. The lists must all have the same number of elements (i.e., the number of columns in the table), and corresponding entries in each list must have compatible data types. The actual data type assigned to each column @@ -1489,12 +1506,12 @@ SELECT 3, 'three'; Syntactically, VALUES followed by expression lists is - treated as equivalent to + treated as equivalent to: SELECT select_list FROM table_expression and can appear anywhere a SELECT can. For example, you can - use it as an arm of a UNION, or attach a + use it as part of a UNION, or attach a sort_specification (ORDER BY, LIMIT, and/or OFFSET) to it. VALUES is most commonly used as the data source in an INSERT command, diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml index ffc641b03a..c81c321134 100644 --- a/doc/src/sgml/query.sgml +++ b/doc/src/sgml/query.sgml @@ -1,4 +1,4 @@ - + The <acronym>SQL</acronym> Language @@ -38,7 +38,7 @@ functions and types. (If you installed a pre-packaged version of PostgreSQL rather than building from source, look for a directory named tutorial within the - PostgreSQL documentation. The make + PostgreSQL distribution. The make part should already have been done for you.) Then, to start the tutorial, do the following: @@ -53,7 +53,7 @@ The \i command reads in commands from the - specified file. The -s option puts you in + specified file. The psql -s option puts you in single step mode which pauses before sending each statement to the server. The commands used in this section are in the file basics.sql. @@ -165,7 +165,7 @@ CREATE TABLE weather ( and a rich set of geometric types. PostgreSQL can be customized with an arbitrary number of user-defined data types. Consequently, type - names are not syntactical key words, except where required to + names are not special key words in the syntax except where required to support special cases in the SQL standard. @@ -421,7 +421,7 @@ SELECT DISTINCT city DISTINCT automatically orders the rows and so ORDER BY is unnecessary. But this is not required by the SQL standard, and current - PostgreSQL doesn't guarantee that + PostgreSQL does not guarantee that DISTINCT causes the rows to be ordered. @@ -451,8 +451,8 @@ SELECT DISTINCT city join query. As an example, say you wish to list all the weather records together with the location of the associated city. To do that, we need to compare the city column of - each row of the weather table with the name column of all rows in - the cities table, and select the pairs of rows where these values match. + each row of the weather table with the name column of all rows in + the cities table, and select the pairs of rows where these values match. This is only a conceptual model. The join is usually performed @@ -486,7 +486,7 @@ SELECT * There is no result row for the city of Hayward. This is because there is no matching entry in the cities table for Hayward, so the join - ignores the unmatched rows in the weather table. We will see + ignores the unmatched rows in the weather table. We will see shortly how this can be fixed. @@ -494,9 +494,9 @@ SELECT * There are two columns containing the city name. This is - correct because the lists of columns of the + correct because the columns from the weather and the - cities table are concatenated. In + cities tables are concatenated. In practice this is undesirable, though, so you will probably want to list the output columns explicitly rather than using *: @@ -514,14 +514,14 @@ SELECT city, temp_lo, temp_hi, prcp, date, location Exercise: - Attempt to find out the semantics of this query when the + Attempt to determine the semantics of this query when the WHERE clause is omitted. Since the columns all had different names, the parser - automatically found out which table they belong to. If there + automatically found which table they belong to. If there were duplicate column names in the two tables you'd need to qualify the column names to show which one you meant, as in: diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml index ac8680412c..4c6fb5c569 100644 --- a/doc/src/sgml/regress.sgml +++ b/doc/src/sgml/regress.sgml @@ -1,4 +1,4 @@ - + Regression Tests @@ -37,7 +37,7 @@ To run the regression tests after building but before installation, - type + type: gmake check @@ -45,7 +45,7 @@ gmake check src/test/regress and run the command there.) This will first build several auxiliary files, such as some sample user-defined trigger functions, and then run the test driver - script. At the end you should see something like + script. At the end you should see something like: ======================= @@ -64,7 +64,7 @@ gmake check If you already did the build as root, you do not have to start all over. Instead, make the regression test directory writable by some other user, log in as that user, and restart the tests. - For example + For example: root# chmod -R a+w src/test/regress root# su - joeuser @@ -101,7 +101,7 @@ gmake check make sure this limit is at least fifty or so, else you might get random-seeming failures in the parallel test. If you are not in a position to raise the limit, you can cut down the degree of parallelism - by setting the MAX_CONNECTIONS parameter. For example, + by setting the MAX_CONNECTIONS parameter. For example: gmake MAX_CONNECTIONS=10 check @@ -111,11 +111,11 @@ gmake MAX_CONNECTIONS=10 check To run the tests after installation)]]>, initialize a data area and start the - server, , ]]> then type + server, , ]]> then type: gmake installcheck -or for a parallel test +or for a parallel test: gmake installcheck-parallel @@ -130,14 +130,14 @@ gmake installcheck-parallel At present, these tests can be used only against an already-installed server. To run the tests for all procedural languages that have been built and installed, change to the src/pl directory of the - build tree and type + build tree and type: gmake installcheck You can also do this in any of the subdirectories of src/pl to run tests for just one procedural language. To run the tests for all contrib modules that have them, change to the - contrib directory of the build tree and type + contrib directory of the build tree and type: gmake installcheck @@ -479,7 +479,7 @@ gmake coverage-html - To reset the execution counts between test runs, run + To reset the execution counts between test runs, run: gmake coverage-clean diff --git a/doc/src/sgml/rowtypes.sgml b/doc/src/sgml/rowtypes.sgml index c19ee8e3f7..d699c39f4a 100644 --- a/doc/src/sgml/rowtypes.sgml +++ b/doc/src/sgml/rowtypes.sgml @@ -1,4 +1,4 @@ - + Composite Types @@ -12,9 +12,9 @@ - A composite type describes the structure of a row or record; - it is in essence just a list of field names and their data types. - PostgreSQL allows values of composite types to be + A composite type represents the structure of a row or record; + it is essentially just a list of field names and their data types. + PostgreSQL allows composite types to be used in many of the same ways that simple types can be used. For example, a column of a table can be declared to be of a composite type. @@ -39,9 +39,9 @@ CREATE TYPE inventory_item AS ( The syntax is comparable to CREATE TABLE, except that only field names and types can be specified; no constraints (such as NOT NULL) can presently be included. Note that the AS keyword - is essential; without it, the system will think a quite different kind - of CREATE TYPE command is meant, and you'll get odd syntax - errors. + is essential; without it, the system will think a different kind + of CREATE TYPE command is meant, and you will get odd syntax + error. @@ -68,8 +68,8 @@ SELECT price_extension(item, 10) FROM on_hand; - Whenever you create a table, a composite type is also automatically - created, with the same name as the table, to represent the table's + Whenever you create a table, a composite type is automatically + created also, with the same name as the table, to represent the table's row type. For example, had we said: CREATE TABLE inventory_item ( @@ -135,7 +135,7 @@ CREATE TABLE inventory_item ( The ROW expression syntax can also be used to construct composite values. In most cases this is considerably - simpler to use than the string-literal syntax, since you don't have + simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above: @@ -169,7 +169,8 @@ SELECT item.name FROM on_hand WHERE item.price > 9.99; This will not work since the name item is taken to be a table - name, not a field name, per SQL syntax rules. You must write it like this: + name, not a column name of on_hand, per SQL syntax rules. + You must write it like this: SELECT (item).name FROM on_hand WHERE (item).price > 9.99; @@ -195,7 +196,7 @@ SELECT (on_hand.item).name FROM on_hand WHERE (on_hand.item).price > 9.99; SELECT (my_func(...)).field FROM ... - Without the extra parentheses, this will provoke a syntax error. + Without the extra parentheses, this will generate a syntax error. @@ -249,7 +250,7 @@ INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2); The external text representation of a composite value consists of items that are interpreted according to the I/O conversion rules for the individual field types, plus decoration that indicates the composite structure. - The decoration consists of parentheses (( and )) + The decoration consists of parentheses around the whole value, plus commas (,) between adjacent items. Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be @@ -263,7 +264,7 @@ INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2); - As shown previously, when writing a composite value you can write double + As shown previously, when writing a composite value you can use double quotes around any individual field value. You must do so if the field value would otherwise confuse the composite-value parser. In particular, fields containing @@ -272,7 +273,8 @@ INSERT INTO mytab (complex_col.r, complex_col.i) VALUES(1.1, 2.2); precede it with a backslash. (Also, a pair of double quotes within a double-quoted field value is taken to represent a double quote character, analogously to the rules for single quotes in SQL literal strings.) - Alternatively, you can use backslash-escaping to protect all data characters + Alternatively, you can avoid quoting and use backslash-escaping to + protect all data characters that would otherwise be taken as composite syntax. diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 4580ed3fb3..6cd5b7ce60 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1,4 +1,4 @@ - + Server Setup and Operation @@ -76,7 +76,7 @@ linkend="app-initdb">,initdb which is installed with PostgreSQL. The desired file system location of your database cluster is indicated by the - option, for example + option, for example: $ initdb -D /usr/local/pgsql/data @@ -382,7 +382,7 @@ FATAL: could not create TCP/IP listen socket - A message like + A message like: FATAL: could not create shared memory segment: Invalid argument DETAIL: Failed system call was shmget(key=5440001, size=4011376640, 03600). @@ -401,7 +401,7 @@ DETAIL: Failed system call was shmget(key=5440001, size=4011376640, 03600). - An error like + An error like: FATAL: could not create semaphores: No space left on device DETAIL: Failed system call was semget(5440126, 17, 03600). diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml index fe581f39be..426ab01fe1 100644 --- a/doc/src/sgml/sources.sgml +++ b/doc/src/sgml/sources.sgml @@ -1,4 +1,4 @@ - + PostgreSQL Coding Conventions @@ -661,10 +661,10 @@ BETTER: unrecognized node type: 42 May vs. Can vs. Might - May suggests permission (e.g. "You may borrow my rake."), + May suggests permission (e.g., "You may borrow my rake."), and has little use in documentation or error messages. - Can suggests ability (e.g. "I can lift that log."), - and might suggests possibility (e.g. "It might rain + Can suggests ability (e.g., "I can lift that log."), + and might suggests possibility (e.g., "It might rain today."). Using the proper word clarifies meaning and assists translation. diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml index 6c63708b0e..62f669835c 100644 --- a/doc/src/sgml/sql.sgml +++ b/doc/src/sgml/sql.sgml @@ -1,4 +1,4 @@ - + SQL @@ -95,7 +95,7 @@ as SQL3 is under development. It is planned to make SQL a Turing-complete - language, i.e. all computable queries (e.g. recursive queries) will be + language, i.e., all computable queries (e.g., recursive queries) will be possible. This has now been completed as SQL:2003. @@ -761,7 +761,7 @@ x(A) ∣ F(x) The relational algebra and the relational calculus have the same - expressive power; i.e. all queries that + expressive power; i.e., all queries that can be formulated using relational algebra can also be formulated using the relational calculus and vice versa. This was first proved by E. F. Codd in @@ -811,7 +811,7 @@ x(A) ∣ F(x) Arithmetic capability: In SQL it is possible to involve - arithmetic operations as well as comparisons, e.g. + arithmetic operations as well as comparisons, e.g.: A < B + 3. @@ -1027,7 +1027,7 @@ SELECT S.SNAME, P.PNAME SUPPLIER × PART × SELLS is derived. Now only those tuples satisfying the - conditions given in the WHERE clause are selected (i.e. the common + conditions given in the WHERE clause are selected (i.e., the common named attributes have to be equal). Finally we project out all columns but S.SNAME and P.PNAME. @@ -1312,7 +1312,7 @@ SELECT COUNT(PNO) SQL allows one to partition the tuples of a table into groups. Then the aggregate functions described above can be applied to the groups — - i.e. the value of the aggregate function is no longer calculated over + i.e., the value of the aggregate function is no longer calculated over all the values of the specified column but over all values of a group. Thus the aggregate function is evaluated separately for every group. @@ -1517,7 +1517,7 @@ SELECT * If we want to know all suppliers that do not sell any part - (e.g. to be able to remove these suppliers from the database) we use: + (e.g., to be able to remove these suppliers from the database) we use: SELECT * @@ -1533,7 +1533,7 @@ SELECT * sells at least one part. Note that we use S.SNO from the outer SELECT within the WHERE clause of the inner SELECT. Here the subquery must be evaluated - afresh for each tuple from the outer query, i.e. the value for + afresh for each tuple from the outer query, i.e., the value for S.SNO is always taken from the current tuple of the outer SELECT. @@ -1811,7 +1811,7 @@ CREATE INDEX I ON SUPPLIER (SNAME); - The created index is maintained automatically, i.e. whenever a new + The created index is maintained automatically, i.e., whenever a new tuple is inserted into the relation SUPPLIER the index I is adapted. Note that the only changes a user can perceive when an index is present are increased speed for SELECT @@ -1826,7 +1826,7 @@ CREATE INDEX I ON SUPPLIER (SNAME); A view can be regarded as a virtual table, - i.e. a table that + i.e., a table that does not physically exist in the database but looks to the user as if it does. By contrast, when we talk of a @@ -1838,7 +1838,7 @@ CREATE INDEX I ON SUPPLIER (SNAME); Views do not have their own, physically separate, distinguishable stored data. Instead, the system stores the definition of the - view (i.e. the rules about how to access physically stored base + view (i.e., the rules about how to access physically stored base tables in order to materialize the view) somewhere in the system catalogs (see ). For a @@ -2082,7 +2082,7 @@ DELETE FROM SUPPLIER In this section we will sketch how SQL can be - embedded into a host language (e.g. C). + embedded into a host language (e.g., C). There are two main reasons why we want to use SQL from a host language: @@ -2090,7 +2090,7 @@ DELETE FROM SUPPLIER There are queries that cannot be formulated using pure SQL - (i.e. recursive queries). To be able to perform such queries we need a + (i.e., recursive queries). To be able to perform such queries we need a host language with a greater expressive power than SQL. @@ -2099,7 +2099,7 @@ DELETE FROM SUPPLIER We simply want to access a database from some application that - is written in the host language (e.g. a ticket reservation system + is written in the host language (e.g., a ticket reservation system with a graphical user interface is written in C and the information about which tickets are still left is stored in a database that can be accessed using embedded SQL). diff --git a/doc/src/sgml/start.sgml b/doc/src/sgml/start.sgml index 67d105cbe0..11bd7895d1 100644 --- a/doc/src/sgml/start.sgml +++ b/doc/src/sgml/start.sgml @@ -1,4 +1,4 @@ - + Getting Started @@ -74,7 +74,7 @@ A server process, which manages the database files, accepts connections to the database from client applications, and - performs actions on the database on behalf of the clients. The + performs database actions on the behalf of the clients. The database server program is called postgres. postgres @@ -108,7 +108,7 @@ The PostgreSQL server can handle - multiple concurrent connections from clients. For that purpose it + multiple concurrent connections from clients. To achieve this it starts (forks) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original @@ -159,25 +159,26 @@ - If you see a message similar to + If you see a message similar to: createdb: command not found then PostgreSQL was not installed properly. Either it was not - installed at all or the search path was not set correctly. Try + installed at all or your shell's search path was not set correctly. Try calling the command with an absolute path instead: $ /usr/local/pgsql/bin/createdb mydb The path at your site might be different. Contact your site - administrator or check back in the installation instructions to + administrator or check the installation instructions to correct the situation. Another response could be this: -createdb: could not connect to database postgres: could not connect to server: No such file or directory +createdb: could not connect to database postgres: could not connect +to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"? @@ -246,7 +247,7 @@ createdb: database creation failed: ERROR: permission denied to create database length. A convenient choice is to create a database with the same name as your current user name. Many tools assume that database name as the default, so it can save you some typing. To create - that database, simply type + that database, simply type: $ createdb @@ -299,7 +300,7 @@ createdb: database creation failed: ERROR: permission denied to create database Using an existing graphical frontend tool like pgAdmin or an office suite with - ODBC support to create and manipulate a + ODBC or JDBC support to create and manipulate a database. These possibilities are not covered in this tutorial. @@ -314,15 +315,15 @@ createdb: database creation failed: ERROR: permission denied to create database - You probably want to start up psql, to try out + You probably want to start up psql to try the examples in this tutorial. It can be activated for the mydb database by typing the command: $ psql mydb - If you leave off the database name then it will default to your + If you do not supply the database name then it will default to your user account name. You already discovered this scheme in the - previous section. + previous section using createdb. @@ -335,15 +336,15 @@ Type "help" for help. mydb=> superuser - The last line could also be + The last line could also be: mydb=# That would mean you are a database superuser, which is most likely the case if you installed PostgreSQL yourself. Being a superuser means that you are not subject to - access controls. For the purposes of this tutorial that is not of - importance. + access controls. For the purposes of this tutorial that is not + important. @@ -395,7 +396,7 @@ mydb=# - To get out of psql, type + To get out of psql, type: mydb=> \q @@ -407,7 +408,7 @@ mydb=# installed correctly you can also type man psql at the operating system shell prompt to see the documentation. In this tutorial we will not use these features explicitly, but you - can use them yourself when you see fit. + can use them yourself when it is helpful. diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 744a13ef47..48bf5a4feb 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -1,4 +1,4 @@ - + SQL Syntax @@ -11,12 +11,12 @@ This chapter describes the syntax of SQL. It forms the foundation for understanding the following chapters which will go into detail - about how the SQL commands are applied to define and modify data. + about how SQL commands are applied to define and modify data. We also advise users who are already familiar with SQL to read this - chapter carefully because there are several rules and concepts that + chapter carefully because it contains several rules and concepts that are implemented inconsistently among SQL databases or that are specific to PostgreSQL. @@ -293,7 +293,7 @@ U&"d!0061t!+000061" UESCAPE '!' bounded by single quotes ('), for example 'This is a string'. To include a single-quote character within a string constant, - write two adjacent single quotes, e.g. + write two adjacent single quotes, e.g., 'Dianne''s horse'. Note that this is not the same as a double-quote character ("). @@ -337,7 +337,7 @@ SELECT 'foo' 'bar'; string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single - quote, e.g. E'foo'. (When continuing an escape string + quote, e.g., E'foo'. (When continuing an escape string constant across lines, write E only before the first opening quote.) Within an escape string, a backslash character (\) begins a @@ -422,14 +422,14 @@ SELECT 'foo' 'bar'; is off, then PostgreSQL recognizes backslash escapes in both regular and escape string constants. This is for backward - compatibility with the historical behavior, in which backslash escapes + compatibility with the historical behavior, where backslash escapes were always recognized. Although standard_conforming_strings currently defaults to off, the default will change to on in a future release for improved standards compliance. Applications are therefore encouraged to migrate away from using backslash escapes. If you need to use a backslash escape to represent a special character, write the - constant with an E to be sure it will be handled the same + string constant with an E to be sure it will be handled the same way in future releases. @@ -442,7 +442,7 @@ SELECT 'foo' 'bar'; - The character with the code zero cannot be in a string constant. + The zero-byte (null byte) character cannot be in a string constant. @@ -896,7 +896,7 @@ CAST ( 'string' AS type ) - A comment is an arbitrary sequence of characters beginning with + A comment is a sequence of characters beginning with double dashes and extending to the end of the line, e.g.: -- This is a standard SQL comment @@ -918,8 +918,8 @@ CAST ( 'string' AS type ) - A comment is removed from the input stream before further syntax - analysis and is effectively replaced by whitespace. + Comment are removed from the input stream before further syntax + analysis and are effectively replaced by whitespace. @@ -1112,7 +1112,7 @@ SELECT 3 OPERATOR(pg_catalog.+) 4; the OPERATOR construct is taken to have the default precedence shown in for any other operator. This is true no matter - which specific operator name appears inside OPERATOR(). + which specific operator appears inside OPERATOR(). @@ -1154,80 +1154,80 @@ SELECT 3 OPERATOR(pg_catalog.+) 4; - A constant or literal value. + A constant or literal value - A column reference. + A column reference A positional parameter reference, in the body of a function definition - or prepared statement. + or prepared statement - A subscripted expression. + A subscripted expression - A field selection expression. + A field selection expression - An operator invocation. + An operator invocation - A function call. + A function call - An aggregate expression. + An aggregate expression - A window function call. + A window function call - A type cast. + A type cast - A scalar subquery. + A scalar subquery - An array constructor. + An array constructor - A row constructor. + A row constructor @@ -1264,7 +1264,7 @@ SELECT 3 OPERATOR(pg_catalog.+) 4; - A column can be referenced in the form + A column can be referenced in the form: correlation.columnname @@ -1426,7 +1426,7 @@ $1.somecolumn where the operator token follows the syntax rules of , or is one of the key words AND, OR, and - NOT, or is a qualified operator name in the form + NOT, or is a qualified operator name in the form: OPERATOR(schema.operatorname) @@ -1714,7 +1714,7 @@ CAST ( expression AS type casts that are marked OK to apply implicitly in the system catalogs. Other casts must be invoked with explicit casting syntax. This restriction is intended to prevent - surprising conversions from being applied silently. + surprising conversions from being silently applied. @@ -1730,7 +1730,7 @@ CAST ( expression AS type timestamp can only be used in this fashion if they are double-quoted, because of syntactic conflicts. Therefore, the use of the function-like cast syntax leads to inconsistencies and should - probably be avoided in new applications. + probably be avoided. @@ -1794,7 +1794,7 @@ SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name) An array constructor is an expression that builds an - array value from values for its member elements. A simple array + array using values for its member elements. A simple array constructor consists of the key word ARRAY, a left square bracket [, a list of expressions (separated by commas) for the @@ -1925,8 +1925,8 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); - A row constructor is an expression that builds a row value (also - called a composite value) from values + A row constructor is an expression that builds a row (also + called a composite value) using values for its member fields. A row constructor consists of the key word ROW, a left parenthesis, zero or more expressions (separated by commas) for the row field values, and finally diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index d813fbcaf8..d3e7a148ea 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1,4 +1,4 @@ - + Full Text Search @@ -74,7 +74,7 @@ Parsing documents into tokens. It is - useful to identify various classes of tokens, e.g. numbers, words, + useful to identify various classes of tokens, e.g., numbers, words, complex words, email addresses, so that they can be processed differently. In principle token classes depend on the specific application, but for most purposes it is adequate to use a predefined @@ -323,7 +323,7 @@ text @@ text The above are all simple text search examples. As mentioned before, full text search functionality includes the ability to do many more things: skip indexing certain words (stop words), process synonyms, and use - sophisticated parsing, e.g. parse based on more than just white space. + sophisticated parsing, e.g., parse based on more than just white space. This functionality is controlled by text search configurations. PostgreSQL comes with predefined configurations for many languages, and you can easily create your own @@ -389,7 +389,7 @@ text @@ text Text search parsers and templates are built from low-level C functions; - therefore it requires C programming ability to develop new ones, and + therefore C programming ability is required to develop new ones, and superuser privileges to install one into a database. (There are examples of add-on parsers and templates in the contrib/ area of the PostgreSQL distribution.) Since dictionaries and @@ -416,7 +416,7 @@ text @@ text Searching a Table - It is possible to do full text search with no index. A simple query + It is possible to do a full text search without an index. A simple query to print the title of each row that contains the word friend in its body field is: @@ -455,7 +455,8 @@ WHERE to_tsvector(body) @@ to_tsquery('friend'); SELECT title FROM pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table') -ORDER BY last_mod_date DESC LIMIT 10; +ORDER BY last_mod_date DESC +LIMIT 10; For clarity we omitted the coalesce function calls @@ -518,7 +519,7 @@ CREATE INDEX pgweb_idx ON pgweb USING gin(to_tsvector(config_name, body)); recording which configuration was used for each index entry. This would be useful, for example, if the document collection contained documents in different languages. Again, - queries that are to use the index must be phrased to match, e.g. + queries that wish to use the index must be phrased to match, e.g., WHERE to_tsvector(config_name, body) @@ 'a & b'. @@ -555,7 +556,8 @@ CREATE INDEX textsearch_idx ON pgweb USING gin(textsearchable_index_col); SELECT title FROM pgweb WHERE textsearchable_index_col @@ to_tsquery('create & table') -ORDER BY last_mod_date DESC LIMIT 10; +ORDER BY last_mod_date DESC +LIMIT 10; @@ -840,7 +842,7 @@ SELECT plainto_tsquery('english', 'The Fat & Rats:C'); document, and how important is the part of the document where they occur. However, the concept of relevancy is vague and very application-specific. Different applications might require additional information for ranking, - e.g. document modification time. The built-in ranking functions are only + e.g., document modification time. The built-in ranking functions are only examples. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs. @@ -877,7 +879,8 @@ SELECT plainto_tsquery('english', 'The Fat & Rats:C'); - ts_rank_cd( weights float4[], vector tsvector, query tsquery , normalization integer ) returns float4 + ts_rank_cd( weights float4[], vector tsvector, + query tsquery , normalization integer ) returns float4 @@ -921,13 +924,13 @@ SELECT plainto_tsquery('english', 'The Fat & Rats:C'); Typically weights are used to mark words from special areas of the - document, like the title or an initial abstract, so that they can be - treated as more or less important than words in the document body. + document, like the title or an initial abstract, so they can be + treated with more or less importance than words in the document body. Since a longer document has a greater chance of containing a query term - it is reasonable to take into account document size, e.g. a hundred-word + it is reasonable to take into account document size, e.g., a hundred-word document with five instances of a search word is probably more relevant than a thousand-word document with five instances. Both ranking functions take an integer normalization option that @@ -996,7 +999,8 @@ SELECT plainto_tsquery('english', 'The Fat & Rats:C'); SELECT title, ts_rank_cd(textsearch, query) AS rank FROM apod, to_tsquery('neutrino|(dark & matter)') query WHERE query @@ textsearch -ORDER BY rank DESC LIMIT 10; +ORDER BY rank DESC +LIMIT 10; title | rank -----------------------------------------------+---------- Neutrinos in the Sun | 3.1 @@ -1017,7 +1021,8 @@ ORDER BY rank DESC LIMIT 10; SELECT title, ts_rank_cd(textsearch, query, 32 /* rank/(rank+1) */ ) AS rank FROM apod, to_tsquery('neutrino|(dark & matter)') query WHERE query @@ textsearch -ORDER BY rank DESC LIMIT 10; +ORDER BY rank DESC +LIMIT 10; title | rank -----------------------------------------------+------------------- Neutrinos in the Sun | 0.756097569485493 @@ -1037,7 +1042,7 @@ ORDER BY rank DESC LIMIT 10; Ranking can be expensive since it requires consulting the tsvector of each matching document, which can be I/O bound and therefore slow. Unfortunately, it is almost impossible to avoid since - practical queries often result in large numbers of matches. + practical queries often result in a large number of matches. @@ -1063,7 +1068,7 @@ ORDER BY rank DESC LIMIT 10; ts_headline accepts a document along - with a query, and returns an excerpt from + with a query, and returns an excerpt of the document in which terms from the query are highlighted. The configuration to be used to parse the document can be specified by config; if config @@ -1080,8 +1085,8 @@ ORDER BY rank DESC LIMIT 10; - StartSel, StopSel: the strings with which - query words appearing in the document should be delimited to distinguish + StartSel, StopSel: the strings to delimit + query words appearing in the document, to distinguish them from other excerpted words. You must double-quote these strings if they contain spaces or commas. @@ -1183,7 +1188,8 @@ SELECT id, ts_headline(body, q), rank FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank FROM apod, to_tsquery('stars') q WHERE ti @@ q - ORDER BY rank DESC LIMIT 10) AS foo; + ORDER BY rank DESC + LIMIT 10) AS foo; @@ -1267,7 +1273,7 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank - This function returns a copy of the input vector in which every + setweight returns a copy of the input vector in which every position has been labeled with the given weight, either A, B, C, or D. (D is the default for new @@ -1467,7 +1473,7 @@ SELECT querytree(to_tsquery('!defined')); The ts_rewrite family of functions search a given tsquery for occurrences of a target - subquery, and replace each occurrence with another + subquery, and replace each occurrence with a substitute subquery. In essence this operation is a tsquery-specific version of substring replacement. A target and substitute combination can be @@ -1567,7 +1573,9 @@ SELECT ts_rewrite(to_tsquery('supernovae & crab'), 'SELECT * FROM aliases'); We can change the rewriting rules just by updating the table: -UPDATE aliases SET s = to_tsquery('supernovae|sn & !nebulae') WHERE t = to_tsquery('supernovae'); +UPDATE aliases +SET s = to_tsquery('supernovae|sn & !nebulae') +WHERE t = to_tsquery('supernovae'); SELECT ts_rewrite(to_tsquery('supernovae & crab'), 'SELECT * FROM aliases'); ts_rewrite @@ -1578,7 +1586,7 @@ SELECT ts_rewrite(to_tsquery('supernovae & crab'), 'SELECT * FROM aliases'); Rewriting can be slow when there are many rewriting rules, since it - checks every rule for a possible hit. To filter out obvious non-candidate + checks every rule for a possible match. To filter out obvious non-candidate rules we can use the containment operators for the tsquery type. In the example below, we select only those rules which might match the original query: @@ -1670,9 +1678,9 @@ SELECT title, body FROM messages WHERE tsv @@ to_tsquery('title & body'); - A limitation of the built-in triggers is that they treat all the + A limitation of built-in triggers is that they treat all the input columns alike. To process columns differently — for - example, to weight title differently from body — it is necessary + example, to weigh title differently from body — it is necessary to write a custom trigger. Here is an example using PL/pgSQL as the trigger language: @@ -1714,11 +1722,13 @@ ON messages FOR EACH ROW EXECUTE PROCEDURE messages_trigger(); - ts_stat(sqlquery text, weights text, OUT word text, OUT ndoc integer, OUT nentry integer) returns setof record + ts_stat(sqlquery text, weights text, + OUT word text, OUT ndoc integer, + OUT nentry integer) returns setof record - sqlquery is a text value containing a SQL + sqlquery is a text value containing an SQL query which must return a single tsvector column. ts_stat executes the query and returns statistics about each distinct lexeme (word) contained in the tsvector @@ -1930,7 +1940,7 @@ LIMIT 10; only the basic ASCII letters are reported as a separate token type, since it is sometimes useful to distinguish them. In most European languages, token types word and asciiword - should always be treated alike. + should be treated alike. @@ -2077,7 +2087,7 @@ SELECT alias, description, token FROM ts_debug('http://example.com/stuff/index.h by the parser, each dictionary in the list is consulted in turn, until some dictionary recognizes it as a known word. If it is identified as a stop word, or if no dictionary recognizes the token, it will be - discarded and not indexed or searched for. + discarded and not indexed or searched. The general rule for configuring a list of dictionaries is to place first the most narrow, most specific dictionary, then the more general dictionaries, finishing with a very general dictionary, like @@ -2268,7 +2278,8 @@ CREATE TEXT SEARCH DICTIONARY my_synonym ( ); ALTER TEXT SEARCH CONFIGURATION english - ALTER MAPPING FOR asciiword WITH my_synonym, english_stem; + ALTER MAPPING FOR asciiword + WITH my_synonym, english_stem; SELECT * FROM ts_debug('english', 'Paris'); alias | description | token | dictionaries | dictionary | lexemes @@ -2428,7 +2439,8 @@ CREATE TEXT SEARCH DICTIONARY thesaurus_simple ( ALTER TEXT SEARCH CONFIGURATION russian - ALTER MAPPING FOR asciiword, asciihword, hword_asciipart WITH thesaurus_simple; + ALTER MAPPING FOR asciiword, asciihword, hword_asciipart + WITH thesaurus_simple; @@ -2457,7 +2469,8 @@ CREATE TEXT SEARCH DICTIONARY thesaurus_astro ( ); ALTER TEXT SEARCH CONFIGURATION russian - ALTER MAPPING FOR asciiword, asciihword, hword_asciipart WITH thesaurus_astro, english_stem; + ALTER MAPPING FOR asciiword, asciihword, hword_asciipart + WITH thesaurus_astro, english_stem; Now we can see how it works. @@ -2520,7 +2533,7 @@ SELECT plainto_tsquery('supernova star'); morphological dictionaries, which can normalize many different linguistic forms of a word into the same lexeme. For example, an English Ispell dictionary can match all declensions and - conjugations of the search term bank, e.g. + conjugations of the search term bank, e.g., banking, banked, banks, banks', and bank's. @@ -2567,9 +2580,8 @@ CREATE TEXT SEARCH DICTIONARY english_ispell ( - Ispell dictionaries support splitting compound words. - This is a nice feature and - PostgreSQL supports it. + Ispell dictionaries support splitting compound words; + a useful feature. Notice that the affix file should specify a special flag using the compoundwords controlled statement that marks dictionary words that can participate in compound formation: @@ -2603,8 +2615,8 @@ SELECT ts_lexize('norwegian_ispell', 'sjokoladefabrikk'); <application>Snowball</> Dictionary - The Snowball dictionary template is based on the project - of Martin Porter, inventor of the popular Porter's stemming algorithm + The Snowball dictionary template is based on a project + by Martin Porter, inventor of the popular Porter's stemming algorithm for the English language. Snowball now provides stemming algorithms for many languages (see the Snowball site for more information). Each algorithm understands how to @@ -2668,7 +2680,7 @@ CREATE TEXT SEARCH DICTIONARY english_stem ( As an example, we will create a configuration - pg, starting from a duplicate of the built-in + pg by duplicating the built-in english configuration. @@ -2767,7 +2779,7 @@ SHOW default_text_search_config; The behavior of a custom text search configuration can easily become - complicated enough to be confusing or undesirable. The functions described + confusing. The functions described in this section are useful for testing text search objects. You can test a complete configuration, or test parsers and dictionaries separately. @@ -2938,7 +2950,7 @@ SELECT * FROM ts_debug('public.english','The Brightest supernovaes'); - You can reduce the volume of output by explicitly specifying which columns + You can reduce the width of the output by explicitly specifying which columns you want to see: @@ -2968,8 +2980,10 @@ FROM ts_debug('public.english','The Brightest supernovaes'); - ts_parse(parser_name text, document text, OUT tokid integer, OUT token text) returns setof record - ts_parse(parser_oid oid, document text, OUT tokid integer, OUT token text) returns setof record + ts_parse(parser_name text, document text, + OUT tokid integer, OUT token text) returns setof record + ts_parse(parser_oid oid, document text, + OUT tokid integer, OUT token text) returns setof record @@ -2997,8 +3011,10 @@ SELECT * FROM ts_parse('default', '123 - a number'); - ts_token_type(parser_name text, OUT tokid integer, OUT alias text, OUT description text) returns setof record - ts_token_type(parser_oid oid, OUT tokid integer, OUT alias text, OUT description text) returns setof record + ts_token_type(parser_name text, OUT tokid integer, + OUT alias text, OUT description text) returns setof record + ts_token_type(parser_oid oid, OUT tokid integer, + OUT alias text, OUT description text) returns setof record @@ -3121,11 +3137,11 @@ SELECT plainto_tsquery('supernovae stars'); - There are two kinds of indexes that can be used to speed up full text + There are two kinds of indexes which can be used to speed up full text searches. Note that indexes are not mandatory for full text searching, but in - cases where a column is searched on a regular basis, an index will - usually be desirable. + cases where a column is searched on a regular basis, an index is + usually desirable. @@ -3179,7 +3195,7 @@ SELECT plainto_tsquery('supernovae stars'); There are substantial performance differences between the two index types, - so it is important to understand which to use. + so it is important to understand their characteristics. @@ -3188,7 +3204,7 @@ SELECT plainto_tsquery('supernovae stars'); to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.) GiST indexes are lossy because each document is represented in the - index by a fixed-length signature. The signature is generated by hashing + index using a fixed-length signature. The signature is generated by hashing each word into a random bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in @@ -3197,7 +3213,7 @@ SELECT plainto_tsquery('supernovae stars'); - Lossiness causes performance degradation due to useless fetches of table + Lossiness causes performance degradation due to unnecessary fetches of table records that turn out to be false matches. Since random access to table records is slow, this limits the usefulness of GiST indexes. The likelihood of false matches depends on several factors, in particular the @@ -3284,7 +3300,7 @@ SELECT plainto_tsquery('supernovae stars'); - The optional parameter PATTERN should be the name of + The optional parameter PATTERN can be the name of a text search object, optionally schema-qualified. If PATTERN is omitted then information about all visible objects will be displayed. PATTERN can be a @@ -3565,7 +3581,7 @@ Parser: "pg_catalog.default" Text search configuration setup is completely different now. Instead of manually inserting rows into configuration tables, search is configured through the specialized SQL commands shown - earlier in this chapter. There is not currently any automated + earlier in this chapter. There is no automated support for converting an existing custom configuration for 8.3; you're on your own here. diff --git a/doc/src/sgml/typeconv.sgml b/doc/src/sgml/typeconv.sgml index 3829275276..beb74f9a57 100644 --- a/doc/src/sgml/typeconv.sgml +++ b/doc/src/sgml/typeconv.sgml @@ -1,4 +1,4 @@ - + Type Conversion @@ -10,15 +10,15 @@ SQL statements can, intentionally or not, require -mixing of different data types in the same expression. +the mixing of different data types in the same expression. PostgreSQL has extensive facilities for evaluating mixed-type expressions. -In many cases a user will not need +In many cases a user does not need to understand the details of the type conversion mechanism. -However, the implicit conversions done by PostgreSQL +However, implicit conversions done by PostgreSQL can affect the results of a query. When necessary, these results can be tailored by using explicit type conversion. @@ -38,21 +38,21 @@ operators. SQL is a strongly typed language. That is, every data item has an associated data type which determines its behavior and allowed usage. PostgreSQL has an extensible type system that is -much more general and flexible than other SQL implementations. +more general and flexible than other SQL implementations. Hence, most type conversion behavior in PostgreSQL is governed by general rules rather than by ad hoc -heuristics. This allows -mixed-type expressions to be meaningful even with user-defined types. +heuristics. This allows the use of mixed-type expressions even with +user-defined types. The PostgreSQL scanner/parser divides lexical -elements into only five fundamental categories: integers, non-integer numbers, +elements into five fundamental categories: integers, non-integer numbers, strings, identifiers, and key words. Constants of most non-numeric types are first classified as strings. The SQL language definition allows specifying type names with strings, and this mechanism can be used in PostgreSQL to start the parser down the correct -path. For example, the query +path. For example, the query: SELECT text 'Origin' AS "label", point '(0,0)' AS "value"; @@ -99,7 +99,7 @@ Operators PostgreSQL allows expressions with prefix and postfix unary (one-argument) operators, as well as binary (two-argument) operators. Like functions, operators can -be overloaded, and so the same problem of selecting the right operator +be overloaded, so the same problem of selecting the right operator exists. @@ -136,13 +136,13 @@ and for the GREATEST and LEAST functions. -The system catalogs store information about which conversions, called -casts, between data types are valid, and how to +The system catalogs store information about which conversions, or +casts, exist between which data types, and how to perform those conversions. Additional casts can be added by the user with the command. (This is usually done in conjunction with defining new data types. The set of casts -between the built-in types has been carefully crafted and is best not +between built-in types has been carefully crafted and is best not altered.) @@ -152,8 +152,8 @@ altered.) -An additional heuristic is provided in the parser to allow better guesses -at proper casting behavior among groups of types that have implicit casts. +An additional heuristic provided by the parser allows improved determination +of the proper casting behavior among groups of types that have implicit casts. Data types are divided into several basic type categories, including boolean, numeric, string, bitstring, datetime, @@ -161,7 +161,7 @@ categories, including boolean, numeric, user-defined. (For a list see ; but note it is also possible to create custom type categories.) Within each category there can be one or more preferred types, which -are preferentially selected when there is ambiguity. With careful selection +are selected when there is ambiguity. With careful selection of preferred types and available implicit casts, it is possible to ensure that ambiguous expressions (those with multiple candidate parsing solutions) can be resolved in a useful way. @@ -179,17 +179,17 @@ Implicit conversions should never have surprising or unpredictable outcomes. -There should be no extra overhead from the parser or executor +There should be no extra overhead in the parser or executor if a query does not need implicit type conversion. -That is, if a query is well formulated and the types already match up, then the query should proceed +That is, if a query is well-formed and the types already match, then the query should execute without spending extra time in the parser and without introducing unnecessary implicit conversion -calls into the query. +calls in the query. Additionally, if a query usually requires an implicit conversion for a function, and if then the user defines a new function with the correct argument types, the parser -should use this new function and will no longer do the implicit conversion using the old function. +should use this new function and no longer do implicit conversion using the old function. @@ -206,9 +206,8 @@ should use this new function and will no longer do the implicit conversion using - The specific operator to be used in an operator invocation is determined - by following - the procedure below. Note that this procedure is indirectly affected + The specific operator invoked is determined by the following + steps. Note that this procedure is affected by the precedence of the involved operators. See for more information. @@ -219,9 +218,9 @@ should use this new function and will no longer do the implicit conversion using Select the operators to be considered from the -pg_operator system catalog. If an unqualified +pg_operator system catalog. If a non-schema-qualified operator name was used (the usual case), the operators -considered are those of the right name and argument count that are +considered are those with a matching name and argument count that are visible in the current search path (see ). If a qualified operator name was given, only operators in the specified schema are considered. @@ -230,8 +229,8 @@ schema are considered. -If the search path finds multiple operators of identical argument types, -only the one appearing earliest in the path is considered. But operators of +If the search path finds multiple operators with identical argument types, +only the one appearing earliest in the path is considered. Operators with different argument types are considered on an equal footing regardless of search path position. @@ -251,7 +250,7 @@ operators considered), use it. If one argument of a binary operator invocation is of the unknown type, then assume it is the same type as the other argument for this check. -Other cases involving unknown will never find a match at +Cases involving two unknown types will never find a match at this step. @@ -276,7 +275,7 @@ candidate remains, use it; else continue to the next step. Run through all candidates and keep those with the most exact matches on input types. (Domains are considered the same as their base type -for this purpose.) Keep all candidates if none have any exact matches. +for this purpose.) Keep all candidates if none have exact matches. If only one candidate remains, use it; else continue to the next step. @@ -296,7 +295,7 @@ categories accepted at those argument positions by the remaining candidates. At each position, select the string category if any candidate accepts that category. (This bias towards string is appropriate -since an unknown-type literal does look like a string.) Otherwise, if +since an unknown-type literal looks like a string.) Otherwise, if all the remaining candidates accept the same type category, select that category; otherwise fail because the correct choice cannot be deduced without more clues. Now discard @@ -339,7 +338,7 @@ SELECT 40 ! AS "40 factorial"; So the parser does a type conversion on the operand and the query -is equivalent to +is equivalent to: SELECT CAST(40 AS bigint) ! AS "40 factorial"; @@ -351,7 +350,7 @@ SELECT CAST(40 AS bigint) ! AS "40 factorial"; String Concatenation Operator Type Resolution -A string-like syntax is used for working with string types as well as for +A string-like syntax is used for working with string types and for working with complex extension types. Strings with unspecified type are matched with likely operator candidates. @@ -371,7 +370,7 @@ SELECT text 'abc' || 'def' AS "text and unknown"; In this case the parser looks to see if there is an operator taking text for both arguments. Since there is, it assumes that the second argument should -be interpreted as of type text. +be interpreted as type text. @@ -391,9 +390,9 @@ In this case there is no initial hint for which type to use, since no types are specified in the query. So, the parser looks for all candidate operators and finds that there are candidates accepting both string-category and bit-string-category inputs. Since string category is preferred when available, -that category is selected, and then the +that category is selected, and the preferred type for strings, text, is used as the specific -type to resolve the unknown literals to. +type to resolve the unknown literals. @@ -460,7 +459,7 @@ SELECT ~ CAST('20' AS int8) AS "negation"; - The specific function to be used in a function invocation is determined + The specific function to be invoked is determined according to the following steps. @@ -470,9 +469,9 @@ SELECT ~ CAST('20' AS int8) AS "negation"; Select the functions to be considered from the -pg_proc system catalog. If an unqualified +pg_proc system catalog. If a non-schema-qualified function name was used, the functions -considered are those of the right name and argument count that are +considered are those with a matching name and argument count that are visible in the current search path (see ). If a qualified function name was given, only functions in the specified schema are considered. @@ -482,7 +481,7 @@ schema are considered. If the search path finds multiple functions of identical argument types, -only the one appearing earliest in the path is considered. But functions of +only the one appearing earliest in the path is considered. Functions of different argument types are considered on an equal footing regardless of search path position. @@ -527,7 +526,7 @@ this step.) -If no exact match is found, see whether the function call appears +If no exact match is found, see if the function call appears to be a special type conversion request. This happens if the function call has just one argument and the function name is the same as the (internal) name of some data type. Furthermore, the function argument must be either @@ -555,7 +554,7 @@ Look for the best match. -Discard candidate functions for which the input types do not match +Discard candidate functions in which the input types do not match and cannot be converted (using an implicit conversion) to match. unknown literals are assumed to be convertible to anything for this purpose. If only one @@ -566,7 +565,7 @@ candidate remains, use it; else continue to the next step. Run through all candidates and keep those with the most exact matches on input types. (Domains are considered the same as their base type -for this purpose.) Keep all candidates if none have any exact matches. +for this purpose.) Keep all candidates if none have exact matches. If only one candidate remains, use it; else continue to the next step. @@ -586,7 +585,7 @@ accepted at those argument positions by the remaining candidates. At each position, select the string category if any candidate accepts that category. (This bias towards string -is appropriate since an unknown-type literal does look like a string.) +is appropriate since an unknown-type literal looks like a string.) Otherwise, if all the remaining candidates accept the same type category, select that category; otherwise fail because the correct choice cannot be deduced without more clues. @@ -616,9 +615,9 @@ Some examples follow. Rounding Function Argument Type Resolution -There is only one round function with two -arguments. (The first is numeric, the second is -integer.) So the following query automatically converts +There is only one round function which takes two +arguments; it takes a first argument of numeric and +a second argument of integer. So the following query automatically converts the first argument of type integer to numeric: @@ -631,7 +630,7 @@ SELECT round(4, 4); (1 row) -That query is actually transformed by the parser to +That query is actually transformed by the parser to: SELECT round(CAST (4 AS numeric), 4); @@ -640,7 +639,7 @@ SELECT round(CAST (4 AS numeric), 4); Since numeric constants with decimal points are initially assigned the type numeric, the following query will require no type -conversion and might therefore be slightly more efficient: +conversion and therefore might be slightly more efficient: SELECT round(4.0, 4); @@ -679,7 +678,7 @@ SELECT substr(varchar '1234', 3); (1 row) -This is transformed by the parser to effectively become +This is transformed by the parser to effectively become: SELECT substr(CAST (varchar '1234' AS text), 3); @@ -863,7 +862,7 @@ their underlying base types. If all inputs are of type unknown, resolve as type text (the preferred type of the string category). -Otherwise, the unknown inputs will be ignored. +Otherwise, unknown inputs are ignored. @@ -914,7 +913,7 @@ SELECT text 'a' AS "text" UNION SELECT 'b'; b (2 rows) -Here, the unknown-type literal 'b' will be resolved as type text. +Here, the unknown-type literal 'b' will be resolved to type text. diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index 5a66bab452..258472f7b9 100644 --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -1,4 +1,4 @@ - + User-Defined Functions @@ -2866,7 +2866,7 @@ typedef struct /* * OPTIONAL pointer to struct containing tuple description * - * tuple_desc is for use when returning tuples (i.e. composite data types) + * tuple_desc is for use when returning tuples (i.e., composite data types) * and is only needed if you are going to build the tuples with * heap_form_tuple() rather than with BuildTupleFromCStrings(). Note that * the TupleDesc pointer stored here should usually have been run through diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml index b1caba0a55..751b1bcdd1 100644 --- a/doc/src/sgml/xml2.sgml +++ b/doc/src/sgml/xml2.sgml @@ -1,4 +1,4 @@ - + xml2 @@ -173,7 +173,7 @@ the name of the key field — this is just a field to be used as - the first column of the output table, i.e. it identifies the record from + the first column of the output table, i.e., it identifies the record from which each output row came (see note below about multiple values)