|
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>68.2. TOAST</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="storage-file-layout.html" title="68.1. Database File Layout" /><link rel="next" href="storage-fsm.html" title="68.3. Free Space Map" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">68.2. TOAST</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="storage-file-layout.html" title="68.1. Database File Layout">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="storage.html" title="Chapter 68. Database Physical Storage">Up</a></td><th width="60%" align="center">Chapter 68. Database Physical Storage</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="storage-fsm.html" title="68.3. Free Space Map">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="STORAGE-TOAST"><div class="titlepage"><div><div><h2 class="title" style="clear: both">68.2. TOAST</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="storage-toast.html#STORAGE-TOAST-ONDISK">68.2.1. Out-of-Line, On-Disk TOAST Storage</a></span></dt><dt><span class="sect2"><a href="storage-toast.html#STORAGE-TOAST-INMEMORY">68.2.2. Out-of-Line, In-Memory TOAST Storage</a></span></dt></dl></div><a id="id-1.10.21.4.2" class="indexterm"></a><a id="id-1.10.21.4.3" class="indexterm"></a><p>
- This section provides an overview of <acronym class="acronym">TOAST</acronym> (The
- Oversized-Attribute Storage Technique).
- </p><p>
- <span class="productname">PostgreSQL</span> uses a fixed page size (commonly
- 8 kB), and does not allow tuples to span multiple pages. Therefore, it is
- not possible to store very large field values directly. To overcome
- this limitation, large field values are compressed and/or broken up into
- multiple physical rows. This happens transparently to the user, with only
- small impact on most of the backend code. The technique is affectionately
- known as <acronym class="acronym">TOAST</acronym> (or <span class="quote">“<span class="quote">the best thing since sliced bread</span>”</span>).
- The <acronym class="acronym">TOAST</acronym> infrastructure is also used to improve handling of
- large data values in-memory.
- </p><p>
- Only certain data types support <acronym class="acronym">TOAST</acronym> — there is no need to
- impose the overhead on data types that cannot produce large field values.
- To support <acronym class="acronym">TOAST</acronym>, a data type must have a variable-length
- (<em class="firstterm">varlena</em>) representation, in which, ordinarily, the first
- four-byte word of any stored value contains the total length of the value in
- bytes (including itself). <acronym class="acronym">TOAST</acronym> does not constrain the rest
- of the data type's representation. The special representations collectively
- called <em class="firstterm"><acronym class="acronym">TOAST</acronym>ed values</em> work by modifying or
- reinterpreting this initial length word. Therefore, the C-level functions
- supporting a <acronym class="acronym">TOAST</acronym>-able data type must be careful about how they
- handle potentially <acronym class="acronym">TOAST</acronym>ed input values: an input might not
- actually consist of a four-byte length word and contents until after it's
- been <em class="firstterm">detoasted</em>. (This is normally done by invoking
- <code class="function">PG_DETOAST_DATUM</code> before doing anything with an input value,
- but in some cases more efficient approaches are possible.
- See <a class="xref" href="xtypes.html#XTYPES-TOAST" title="37.13.1. TOAST Considerations">Section 37.13.1</a> for more detail.)
- </p><p>
- <acronym class="acronym">TOAST</acronym> usurps two bits of the varlena length word (the high-order
- bits on big-endian machines, the low-order bits on little-endian machines),
- thereby limiting the logical size of any value of a <acronym class="acronym">TOAST</acronym>-able
- data type to 1 GB (2<sup>30</sup> - 1 bytes). When both bits are zero,
- the value is an ordinary un-<acronym class="acronym">TOAST</acronym>ed value of the data type, and
- the remaining bits of the length word give the total datum size (including
- length word) in bytes. When the highest-order or lowest-order bit is set,
- the value has only a single-byte header instead of the normal four-byte
- header, and the remaining bits of that byte give the total datum size
- (including length byte) in bytes. This alternative supports space-efficient
- storage of values shorter than 127 bytes, while still allowing the data type
- to grow to 1 GB at need. Values with single-byte headers aren't aligned on
- any particular boundary, whereas values with four-byte headers are aligned on
- at least a four-byte boundary; this omission of alignment padding provides
- additional space savings that is significant compared to short values.
- As a special case, if the remaining bits of a single-byte header are all
- zero (which would be impossible for a self-inclusive length), the value is
- a pointer to out-of-line data, with several possible alternatives as
- described below. The type and size of such a <em class="firstterm">TOAST pointer</em>
- are determined by a code stored in the second byte of the datum.
- Lastly, when the highest-order or lowest-order bit is clear but the adjacent
- bit is set, the content of the datum has been compressed and must be
- decompressed before use. In this case the remaining bits of the four-byte
- length word give the total size of the compressed datum, not the
- original data. Note that compression is also possible for out-of-line data
- but the varlena header does not tell whether it has occurred —
- the content of the <acronym class="acronym">TOAST</acronym> pointer tells that, instead.
- </p><p>
- As mentioned, there are multiple types of <acronym class="acronym">TOAST</acronym> pointer datums.
- The oldest and most common type is a pointer to out-of-line data stored in
- a <em class="firstterm"><acronym class="acronym">TOAST</acronym> table</em> that is separate from, but
- associated with, the table containing the <acronym class="acronym">TOAST</acronym> pointer datum
- itself. These <em class="firstterm">on-disk</em> pointer datums are created by the
- <acronym class="acronym">TOAST</acronym> management code (in <code class="filename">access/heap/tuptoaster.c</code>)
- when a tuple to be stored on disk is too large to be stored as-is.
- Further details appear in <a class="xref" href="storage-toast.html#STORAGE-TOAST-ONDISK" title="68.2.1. Out-of-Line, On-Disk TOAST Storage">Section 68.2.1</a>.
- Alternatively, a <acronym class="acronym">TOAST</acronym> pointer datum can contain a pointer to
- out-of-line data that appears elsewhere in memory. Such datums are
- necessarily short-lived, and will never appear on-disk, but they are very
- useful for avoiding copying and redundant processing of large data values.
- Further details appear in <a class="xref" href="storage-toast.html#STORAGE-TOAST-INMEMORY" title="68.2.2. Out-of-Line, In-Memory TOAST Storage">Section 68.2.2</a>.
- </p><p>
- The compression technique used for either in-line or out-of-line compressed
- data is a fairly simple and very fast member
- of the LZ family of compression techniques. See
- <code class="filename">src/common/pg_lzcompress.c</code> for the details.
- </p><div class="sect2" id="STORAGE-TOAST-ONDISK"><div class="titlepage"><div><div><h3 class="title">68.2.1. Out-of-Line, On-Disk TOAST Storage</h3></div></div></div><p>
- If any of the columns of a table are <acronym class="acronym">TOAST</acronym>-able, the table will
- have an associated <acronym class="acronym">TOAST</acronym> table, whose OID is stored in the table's
- <code class="structname">pg_class</code>.<code class="structfield">reltoastrelid</code> entry. On-disk
- <acronym class="acronym">TOAST</acronym>ed values are kept in the <acronym class="acronym">TOAST</acronym> table, as
- described in more detail below.
- </p><p>
- Out-of-line values are divided (after compression if used) into chunks of at
- most <code class="symbol">TOAST_MAX_CHUNK_SIZE</code> bytes (by default this value is chosen
- so that four chunk rows will fit on a page, making it about 2000 bytes).
- Each chunk is stored as a separate row in the <acronym class="acronym">TOAST</acronym> table
- belonging to the owning table. Every
- <acronym class="acronym">TOAST</acronym> table has the columns <code class="structfield">chunk_id</code> (an OID
- identifying the particular <acronym class="acronym">TOAST</acronym>ed value),
- <code class="structfield">chunk_seq</code> (a sequence number for the chunk within its value),
- and <code class="structfield">chunk_data</code> (the actual data of the chunk). A unique index
- on <code class="structfield">chunk_id</code> and <code class="structfield">chunk_seq</code> provides fast
- retrieval of the values. A pointer datum representing an out-of-line on-disk
- <acronym class="acronym">TOAST</acronym>ed value therefore needs to store the OID of the
- <acronym class="acronym">TOAST</acronym> table in which to look and the OID of the specific value
- (its <code class="structfield">chunk_id</code>). For convenience, pointer datums also store the
- logical datum size (original uncompressed data length) and physical stored size
- (different if compression was applied). Allowing for the varlena header bytes,
- the total size of an on-disk <acronym class="acronym">TOAST</acronym> pointer datum is therefore 18
- bytes regardless of the actual size of the represented value.
- </p><p>
- The <acronym class="acronym">TOAST</acronym> management code is triggered only
- when a row value to be stored in a table is wider than
- <code class="symbol">TOAST_TUPLE_THRESHOLD</code> bytes (normally 2 kB).
- The <acronym class="acronym">TOAST</acronym> code will compress and/or move
- field values out-of-line until the row value is shorter than
- <code class="symbol">TOAST_TUPLE_TARGET</code> bytes (also normally 2 kB, adjustable)
- or no more gains can be had. During an UPDATE
- operation, values of unchanged fields are normally preserved as-is; so an
- UPDATE of a row with out-of-line values incurs no <acronym class="acronym">TOAST</acronym> costs if
- none of the out-of-line values change.
- </p><p>
- The <acronym class="acronym">TOAST</acronym> management code recognizes four different strategies
- for storing <acronym class="acronym">TOAST</acronym>-able columns on disk:
-
- </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
- <code class="literal">PLAIN</code> prevents either compression or
- out-of-line storage; furthermore it disables use of single-byte headers
- for varlena types.
- This is the only possible strategy for
- columns of non-<acronym class="acronym">TOAST</acronym>-able data types.
- </p></li><li class="listitem"><p>
- <code class="literal">EXTENDED</code> allows both compression and out-of-line
- storage. This is the default for most <acronym class="acronym">TOAST</acronym>-able data types.
- Compression will be attempted first, then out-of-line storage if
- the row is still too big.
- </p></li><li class="listitem"><p>
- <code class="literal">EXTERNAL</code> allows out-of-line storage but not
- compression. Use of <code class="literal">EXTERNAL</code> will
- make substring operations on wide <code class="type">text</code> and
- <code class="type">bytea</code> columns faster (at the penalty of increased storage
- space) because these operations are optimized to fetch only the
- required parts of the out-of-line value when it is not compressed.
- </p></li><li class="listitem"><p>
- <code class="literal">MAIN</code> allows compression but not out-of-line
- storage. (Actually, out-of-line storage will still be performed
- for such columns, but only as a last resort when there is no other
- way to make the row small enough to fit on a page.)
- </p></li></ul></div><p>
-
- Each <acronym class="acronym">TOAST</acronym>-able data type specifies a default strategy for columns
- of that data type, but the strategy for a given table column can be altered
- with <a class="link" href="sql-altertable.html" title="ALTER TABLE"><code class="command">ALTER TABLE ... SET STORAGE</code></a>.
- </p><p>
- <code class="symbol">TOAST_TUPLE_TARGET</code> can be adjusted for each table using
- <a class="link" href="sql-altertable.html" title="ALTER TABLE"><code class="command">ALTER TABLE ... SET (toast_tuple_target = N)</code></a>
- </p><p>
- This scheme has a number of advantages compared to a more straightforward
- approach such as allowing row values to span pages. Assuming that queries are
- usually qualified by comparisons against relatively small key values, most of
- the work of the executor will be done using the main row entry. The big values
- of <acronym class="acronym">TOAST</acronym>ed attributes will only be pulled out (if selected at all)
- at the time the result set is sent to the client. Thus, the main table is much
- smaller and more of its rows fit in the shared buffer cache than would be the
- case without any out-of-line storage. Sort sets shrink also, and sorts will
- more often be done entirely in memory. A little test showed that a table
- containing typical HTML pages and their URLs was stored in about half of the
- raw data size including the <acronym class="acronym">TOAST</acronym> table, and that the main table
- contained only about 10% of the entire data (the URLs and some small HTML
- pages). There was no run time difference compared to an un-<acronym class="acronym">TOAST</acronym>ed
- comparison table, in which all the HTML pages were cut down to 7 kB to fit.
- </p></div><div class="sect2" id="STORAGE-TOAST-INMEMORY"><div class="titlepage"><div><div><h3 class="title">68.2.2. Out-of-Line, In-Memory TOAST Storage</h3></div></div></div><p>
- <acronym class="acronym">TOAST</acronym> pointers can point to data that is not on disk, but is
- elsewhere in the memory of the current server process. Such pointers
- obviously cannot be long-lived, but they are nonetheless useful. There
- are currently two sub-cases:
- pointers to <em class="firstterm">indirect</em> data and
- pointers to <em class="firstterm">expanded</em> data.
- </p><p>
- Indirect <acronym class="acronym">TOAST</acronym> pointers simply point at a non-indirect varlena
- value stored somewhere in memory. This case was originally created merely
- as a proof of concept, but it is currently used during logical decoding to
- avoid possibly having to create physical tuples exceeding 1 GB (as pulling
- all out-of-line field values into the tuple might do). The case is of
- limited use since the creator of the pointer datum is entirely responsible
- that the referenced data survives for as long as the pointer could exist,
- and there is no infrastructure to help with this.
- </p><p>
- Expanded <acronym class="acronym">TOAST</acronym> pointers are useful for complex data types
- whose on-disk representation is not especially suited for computational
- purposes. As an example, the standard varlena representation of a
- <span class="productname">PostgreSQL</span> array includes dimensionality information, a
- nulls bitmap if there are any null elements, then the values of all the
- elements in order. When the element type itself is variable-length, the
- only way to find the <em class="replaceable"><code>N</code></em>'th element is to scan through all the
- preceding elements. This representation is appropriate for on-disk storage
- because of its compactness, but for computations with the array it's much
- nicer to have an <span class="quote">“<span class="quote">expanded</span>”</span> or <span class="quote">“<span class="quote">deconstructed</span>”</span>
- representation in which all the element starting locations have been
- identified. The <acronym class="acronym">TOAST</acronym> pointer mechanism supports this need by
- allowing a pass-by-reference Datum to point to either a standard varlena
- value (the on-disk representation) or a <acronym class="acronym">TOAST</acronym> pointer that
- points to an expanded representation somewhere in memory. The details of
- this expanded representation are up to the data type, though it must have
- a standard header and meet the other API requirements given
- in <code class="filename">src/include/utils/expandeddatum.h</code>. C-level functions
- working with the data type can choose to handle either representation.
- Functions that do not know about the expanded representation, but simply
- apply <code class="function">PG_DETOAST_DATUM</code> to their inputs, will automatically
- receive the traditional varlena representation; so support for an expanded
- representation can be introduced incrementally, one function at a time.
- </p><p>
- <acronym class="acronym">TOAST</acronym> pointers to expanded values are further broken down
- into <em class="firstterm">read-write</em> and <em class="firstterm">read-only</em> pointers.
- The pointed-to representation is the same either way, but a function that
- receives a read-write pointer is allowed to modify the referenced value
- in-place, whereas one that receives a read-only pointer must not; it must
- first create a copy if it wants to make a modified version of the value.
- This distinction and some associated conventions make it possible to avoid
- unnecessary copying of expanded values during query execution.
- </p><p>
- For all types of in-memory <acronym class="acronym">TOAST</acronym> pointer, the <acronym class="acronym">TOAST</acronym>
- management code ensures that no such pointer datum can accidentally get
- stored on disk. In-memory <acronym class="acronym">TOAST</acronym> pointers are automatically
- expanded to normal in-line varlena values before storage — and then
- possibly converted to on-disk <acronym class="acronym">TOAST</acronym> pointers, if the containing
- tuple would otherwise be too big.
- </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="storage-file-layout.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="storage.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="storage-fsm.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">68.1. Database File Layout </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 68.3. Free Space Map</td></tr></table></div></body></html>
|