From 9bb955c8286c20474b5462eea3e3cf76c694d88f Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Wed, 18 Feb 2015 22:33:39 -0500 Subject: [PATCH] Update assorted TOAST-related documentation. While working on documentation for expanded arrays, I noticed a number of details in the TOAST-related documentation that were already inaccurate or obsolete. This should be fixed independently of whether expanded arrays get in or not. One issue is that the already existing indirect-pointer facility was not documented at all. Also, the documentation says that you only need to use VARSIZE/SET_VARSIZE if you've made your variable-length type TOAST-aware, but actually we've forced that business on all varlena types even if they've opted out of TOAST by setting storage = plain. Wordsmith a few other things too, like an amusingly archaic claim that there are few 64-bit machines. I thought about back-patching this, but since all this doco is oriented to hackers and C-coded extension authors, fixing it in HEAD is probably good enough. --- doc/src/sgml/ref/create_type.sgml | 25 ++++-- doc/src/sgml/storage.sgml | 143 ++++++++++++++++++++++-------- doc/src/sgml/xtypes.sgml | 52 ++++++----- 3 files changed, 157 insertions(+), 63 deletions(-) diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml index e5d7992bbf..f9e1297d0b 100644 --- a/doc/src/sgml/ref/create_type.sgml +++ b/doc/src/sgml/ref/create_type.sgml @@ -329,15 +329,17 @@ CREATE TYPE name to VARIABLE. (Internally, this is represented by setting typlen to -1.) The internal representation of all variable-length types must start with a 4-byte integer giving the total - length of this value of the type. + length of this value of the type. (Note that the length field is often + encoded, as described in ; it's unwise + to access it directly.) The optional flag PASSEDBYVALUE indicates that values of this data type are passed by value, rather than by - reference. You cannot pass by value types whose internal - representation is larger than the size of the Datum type - (4 bytes on most machines, 8 bytes on a few). + reference. Types passed by value must be fixed-length, and their internal + representation cannot be larger than the size of the Datum type + (4 bytes on some machines, 8 bytes on others). @@ -367,6 +369,17 @@ CREATE TYPE name external items.) + + All storage values other + than plain imply that the functions of the data type + can handle values that have been toasted, as described + in and . + The specific other value given merely determines the default TOAST + storage strategy for columns of a toastable data type; users can pick + other strategies for individual columns using ALTER TABLE + SET STORAGE. + + The like_type parameter provides an alternative method for specifying the basic representation @@ -465,8 +478,8 @@ CREATE TYPE name identical things, and you want to allow these things to be accessed directly by subscripting, in addition to whatever operations you plan to provide for the type as a whole. For example, type point - is represented as just two floating-point numbers, each can be accessed using - point[0] and point[1]. + is represented as just two floating-point numbers, which can be accessed + using point[0] and point[1]. Note that this facility only works for fixed-length types whose internal form is exactly a sequence of identical fixed-length fields. A subscriptable diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index 85a8de2ece..d8c52875d8 100644 --- a/doc/src/sgml/storage.sgml +++ b/doc/src/sgml/storage.sgml @@ -303,25 +303,33 @@ Oversized-Attribute Storage Technique). PostgreSQL uses a fixed page size (commonly -8 kB), and does not allow tuples to span multiple pages. Therefore, it is +8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome -this limitation, large field values are compressed and/or broken up into -multiple physical rows. This happens transparently to the user, with only +this limitation, large field values are compressed and/or broken up into +multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately -known as TOAST (or the best thing since sliced bread). +known as TOAST (or the best thing since sliced bread). +The TOAST infrastructure is also used to improve handling of +large data values in-memory. Only certain data types support TOAST — there is no need to impose the overhead on data types that cannot produce large field values. To support TOAST, a data type must have a variable-length -(varlena) representation, in which the first 32-bit word of any -stored value contains the total length of the value in bytes (including -itself). TOAST does not constrain the rest of the representation. -All the C-level functions supporting a TOAST-able data type must -be careful to handle TOASTed input values. (This is normally done -by invoking PG_DETOAST_DATUM before doing anything with an input -value, but in some cases more efficient approaches are possible.) +(varlena) representation, in which, ordinarily, the first +four-byte word of any stored value contains the total length of the value in +bytes (including itself). TOAST does not constrain the rest +of the data type's representation. The special representations collectively +called TOASTed values work by modifying or +reinterpreting this initial length word. Therefore, the C-level functions +supporting a TOAST-able data type must be careful about how they +handle potentially TOASTed input values: an input might not +actually consist of a four-byte length word and contents until after it's +been detoasted. (This is normally done by invoking +PG_DETOAST_DATUM before doing anything with an input value, +but in some cases more efficient approaches are possible. +See for more detail.) @@ -333,58 +341,84 @@ the value is an ordinary un-TOASTed value of the data type, and the remaining bits of the length word give the total datum size (including length word) in bytes. When the highest-order or lowest-order bit is set, the value has only a single-byte header instead of the normal four-byte -header, and the remaining bits give the total datum size (including length -byte) in bytes. As a special case, if the remaining bits are all zero -(which would be impossible for a self-inclusive length), the value is a -pointer to out-of-line data stored in a separate TOAST table. (The size of -a TOAST pointer is given in the second byte of the datum.) -Values with single-byte headers aren't aligned on any particular -boundary, either. Lastly, when the highest-order or lowest-order bit is -clear but the adjacent bit is set, the content of the datum has been -compressed and must be decompressed before use. In this case the remaining -bits of the length word give the total size of the compressed datum, not the +header, and the remaining bits of that byte give the total datum size +(including length byte) in bytes. This alternative supports space-efficient +storage of values shorter than 127 bytes, while still allowing the data type +to grow to 1 GB at need. Values with single-byte headers aren't aligned on +any particular boundary, whereas values with four-byte headers are aligned on +at least a four-byte boundary; this omission of alignment padding provides +additional space savings that is significant compared to short values. +As a special case, if the remaining bits of a single-byte header are all +zero (which would be impossible for a self-inclusive length), the value is +a pointer to out-of-line data, with several possible alternatives as +described below. The type and size of such a TOAST pointer +are determined by a code stored in the second byte of the datum. +Lastly, when the highest-order or lowest-order bit is clear but the adjacent +bit is set, the content of the datum has been compressed and must be +decompressed before use. In this case the remaining bits of the four-byte +length word give the total size of the compressed datum, not the original data. Note that compression is also possible for out-of-line data but the varlena header does not tell whether it has occurred — -the content of the TOAST pointer tells that, instead. +the content of the TOAST pointer tells that, instead. -If any of the columns of a table are TOAST-able, the table will -have an associated TOAST table, whose OID is stored in the table's -pg_class.reltoastrelid entry. Out-of-line -TOASTed values are kept in the TOAST table, as -described in more detail below. +As mentioned, there are multiple types of TOAST pointer datums. +The oldest and most common type is a pointer to out-of-line data stored in +a TOAST table that is separate from, but +associated with, the table containing the TOAST pointer datum +itself. These on-disk pointer datums are created by the +TOAST management code (in access/heap/tuptoaster.c) +when a tuple to be stored on disk is too large to be stored as-is. +Further details appear in . +Alternatively, a TOAST pointer datum can contain a pointer to +out-of-line data that appears elsewhere in memory. Such datums are +necessarily short-lived, and will never appear on-disk, but they are very +useful for avoiding copying and redundant processing of large data values. +Further details appear in . -The compression technique used is a fairly simple and very fast member +The compression technique used for either in-line or out-of-line compressed +data is a fairly simple and very fast member of the LZ family of compression techniques. See src/common/pg_lzcompress.c for the details. + + Out-of-line, on-disk TOAST storage + + +If any of the columns of a table are TOAST-able, the table will +have an associated TOAST table, whose OID is stored in the table's +pg_class.reltoastrelid entry. On-disk +TOASTed values are kept in the TOAST table, as +described in more detail below. + + Out-of-line values are divided (after compression if used) into chunks of at most TOAST_MAX_CHUNK_SIZE bytes (by default this value is chosen so that four chunk rows will fit on a page, making it about 2000 bytes). -Each chunk is stored -as a separate row in the TOAST table for the owning table. Every +Each chunk is stored as a separate row in the TOAST table +belonging to the owning table. Every TOAST table has the columns chunk_id (an OID identifying the particular TOASTed value), chunk_seq (a sequence number for the chunk within its value), and chunk_data (the actual data of the chunk). A unique index on chunk_id and chunk_seq provides fast -retrieval of the values. A pointer datum representing an out-of-line +retrieval of the values. A pointer datum representing an out-of-line on-disk TOASTed value therefore needs to store the OID of the TOAST table in which to look and the OID of the specific value (its chunk_id). For convenience, pointer datums also store the -logical datum size (original uncompressed data length) and actual stored size +logical datum size (original uncompressed data length) and physical stored size (different if compression was applied). Allowing for the varlena header bytes, -the total size of a TOAST pointer datum is therefore 18 bytes -regardless of the actual size of the represented value. +the total size of an on-disk TOAST pointer datum is therefore 18 +bytes regardless of the actual size of the represented value. -The TOAST code is triggered only +The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move @@ -397,8 +431,8 @@ none of the out-of-line values change. -The TOAST code recognizes four different strategies for storing -TOAST-able columns: +The TOAST management code recognizes four different strategies +for storing TOAST-able columns on disk: @@ -460,6 +494,41 @@ pages). There was no run time difference compared to an un-TOASTed comparison table, in which all the HTML pages were cut down to 7 kB to fit. + + + + Out-of-line, in-memory TOAST storage + + +TOAST pointers can point to data that is not on disk, but is +elsewhere in the memory of the current server process. Such pointers +obviously cannot be long-lived, but they are nonetheless useful. There +is currently just one sub-case: +pointers to indirect data. + + + +Indirect TOAST pointers simply point at a non-indirect varlena +value stored somewhere in memory. This case was originally created merely +as a proof of concept, but it is currently used during logical decoding to +avoid possibly having to create physical tuples exceeding 1 GB (as pulling +all out-of-line field values into the tuple might do). The case is of +limited use since the creator of the pointer datum is entirely responsible +that the referenced data survives for as long as the pointer could exist, +and there is no infrastructure to help with this. + + + +For all types of in-memory TOAST pointer, the TOAST +management code ensures that no such pointer datum can accidentally get +stored on disk. In-memory TOAST pointers are automatically +expanded to normal in-line varlena values before storage — and then +possibly converted to on-disk TOAST pointers, if the containing +tuple would otherwise be too big. + + + + diff --git a/doc/src/sgml/xtypes.sgml b/doc/src/sgml/xtypes.sgml index e1340baeb7..2459616281 100644 --- a/doc/src/sgml/xtypes.sgml +++ b/doc/src/sgml/xtypes.sgml @@ -234,35 +234,49 @@ CREATE TYPE complex ( + If the internal representation of the data type is variable-length, the + internal representation must follow the standard layout for variable-length + data: the first four bytes must be a char[4] field which is + never accessed directly (customarily named vl_len_). You + must use the SET_VARSIZE() macro to store the total + size of the datum (including the length field itself) in this field + and VARSIZE() to retrieve it. (These macros exist + because the length field may be encoded depending on platform.) + + + + For further details see the description of the + command. + + + + TOAST Considerations TOAST and user-defined types - If the values of your data type vary in size (in internal form), you should - make the data type TOAST-able (see ). You should do this even if the data are always + + + If the values of your data type vary in size (in internal form), it's + usually desirable to make the data type TOAST-able (see ). You should do this even if the values are always too small to be compressed or stored externally, because TOAST can save space on small data too, by reducing header overhead. - To do this, the internal representation must follow the standard layout for - variable-length data: the first four bytes must be a char[4] - field which is never accessed directly (customarily named - vl_len_). You - must use SET_VARSIZE() to store the size of the datum - in this field and VARSIZE() to retrieve it. The C - functions operating on the data type must always be careful to unpack any - toasted values they are handed, by using PG_DETOAST_DATUM. - (This detail is customarily hidden by defining type-specific - GETARG_DATATYPE_P macros.) Then, when running the - CREATE TYPE command, specify the internal length as - variable and select the appropriate storage option. + To support TOAST storage, the C functions operating on the data + type must always be careful to unpack any toasted values they are handed + by using PG_DETOAST_DATUM. (This detail is customarily hidden + by defining type-specific GETARG_DATATYPE_P macros.) + Then, when running the CREATE TYPE command, specify the + internal length as variable and select some appropriate storage + option other than plain. - If the alignment is unimportant (either just for a specific function or + If data alignment is unimportant (either just for a specific function or because the data type specifies byte alignment anyway) then it's possible to avoid some of the overhead of PG_DETOAST_DATUM. You can use PG_DETOAST_DATUM_PACKED instead (customarily hidden by @@ -286,8 +300,6 @@ CREATE TYPE complex ( - - For further details see the description of the - command. - + + -- 2.40.0