|
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>29.4. WAL Configuration</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="wal-async-commit.html" title="29.3. Asynchronous Commit" /><link rel="next" href="wal-internals.html" title="29.5. WAL Internals" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">29.4. <acronym xmlns="http://www.w3.org/1999/xhtml" class="acronym">WAL</acronym> Configuration</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="wal-async-commit.html" title="29.3. Asynchronous Commit">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="wal.html" title="Chapter 29. Reliability and the Write-Ahead Log">Up</a></td><th width="60%" align="center">Chapter 29. Reliability and the Write-Ahead Log</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="wal-internals.html" title="29.5. WAL Internals">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="WAL-CONFIGURATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">29.4. <acronym class="acronym">WAL</acronym> Configuration</h2></div></div></div><p>
- There are several <acronym class="acronym">WAL</acronym>-related configuration parameters that
- affect database performance. This section explains their use.
- Consult <a class="xref" href="runtime-config.html" title="Chapter 19. Server Configuration">Chapter 19</a> for general information about
- setting server configuration parameters.
- </p><p>
- <em class="firstterm">Checkpoints</em><a id="id-1.6.16.6.3.2" class="indexterm"></a>
- are points in the sequence of transactions at which it is guaranteed
- that the heap and index data files have been updated with all
- information written before that checkpoint. At checkpoint time, all
- dirty data pages are flushed to disk and a special checkpoint record is
- written to the log file. (The change records were previously flushed
- to the <acronym class="acronym">WAL</acronym> files.)
- In the event of a crash, the crash recovery procedure looks at the latest
- checkpoint record to determine the point in the log (known as the redo
- record) from which it should start the REDO operation. Any changes made to
- data files before that point are guaranteed to be already on disk.
- Hence, after a checkpoint, log segments preceding the one containing
- the redo record are no longer needed and can be recycled or removed. (When
- <acronym class="acronym">WAL</acronym> archiving is being done, the log segments must be
- archived before being recycled or removed.)
- </p><p>
- The checkpoint requirement of flushing all dirty data pages to disk
- can cause a significant I/O load. For this reason, checkpoint
- activity is throttled so that I/O begins at checkpoint start and completes
- before the next checkpoint is due to start; this minimizes performance
- degradation during checkpoints.
- </p><p>
- The server's checkpointer process automatically performs
- a checkpoint every so often. A checkpoint is begun every <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-TIMEOUT">checkpoint_timeout</a> seconds, or if
- <a class="xref" href="runtime-config-wal.html#GUC-MAX-WAL-SIZE">max_wal_size</a> is about to be exceeded,
- whichever comes first.
- The default settings are 5 minutes and 1 GB, respectively.
- If no WAL has been written since the previous checkpoint, new checkpoints
- will be skipped even if <code class="varname">checkpoint_timeout</code> has passed.
- (If WAL archiving is being used and you want to put a lower limit on how
- often files are archived in order to bound potential data loss, you should
- adjust the <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT">archive_timeout</a> parameter rather than the
- checkpoint parameters.)
- It is also possible to force a checkpoint by using the SQL
- command <code class="command">CHECKPOINT</code>.
- </p><p>
- Reducing <code class="varname">checkpoint_timeout</code> and/or
- <code class="varname">max_wal_size</code> causes checkpoints to occur
- more often. This allows faster after-crash recovery, since less work
- will need to be redone. However, one must balance this against the
- increased cost of flushing dirty data pages more often. If
- <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> is set (as is the default), there is
- another factor to consider. To ensure data page consistency,
- the first modification of a data page after each checkpoint results in
- logging the entire page content. In that case,
- a smaller checkpoint interval increases the volume of output to the WAL log,
- partially negating the goal of using a smaller interval,
- and in any case causing more disk I/O.
- </p><p>
- Checkpoints are fairly expensive, first because they require writing
- out all currently dirty buffers, and second because they result in
- extra subsequent WAL traffic as discussed above. It is therefore
- wise to set the checkpointing parameters high enough so that checkpoints
- don't happen too often. As a simple sanity check on your checkpointing
- parameters, you can set the <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-WARNING">checkpoint_warning</a>
- parameter. If checkpoints happen closer together than
- <code class="varname">checkpoint_warning</code> seconds,
- a message will be output to the server log recommending increasing
- <code class="varname">max_wal_size</code>. Occasional appearance of such
- a message is not cause for alarm, but if it appears often then the
- checkpoint control parameters should be increased. Bulk operations such
- as large <code class="command">COPY</code> transfers might cause a number of such warnings
- to appear if you have not set <code class="varname">max_wal_size</code> high
- enough.
- </p><p>
- To avoid flooding the I/O system with a burst of page writes,
- writing dirty buffers during a checkpoint is spread over a period of time.
- That period is controlled by
- <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-COMPLETION-TARGET">checkpoint_completion_target</a>, which is
- given as a fraction of the checkpoint interval.
- The I/O rate is adjusted so that the checkpoint finishes when the
- given fraction of
- <code class="varname">checkpoint_timeout</code> seconds have elapsed, or before
- <code class="varname">max_wal_size</code> is exceeded, whichever is sooner.
- With the default value of 0.5,
- <span class="productname">PostgreSQL</span> can be expected to complete each checkpoint
- in about half the time before the next checkpoint starts. On a system
- that's very close to maximum I/O throughput during normal operation,
- you might want to increase <code class="varname">checkpoint_completion_target</code>
- to reduce the I/O load from checkpoints. The disadvantage of this is that
- prolonging checkpoints affects recovery time, because more WAL segments
- will need to be kept around for possible use in recovery. Although
- <code class="varname">checkpoint_completion_target</code> can be set as high as 1.0,
- it is best to keep it less than that (perhaps 0.9 at most) since
- checkpoints include some other activities besides writing dirty buffers.
- A setting of 1.0 is quite likely to result in checkpoints not being
- completed on time, which would result in performance loss due to
- unexpected variation in the number of WAL segments needed.
- </p><p>
- On Linux and POSIX platforms <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-FLUSH-AFTER">checkpoint_flush_after</a>
- allows to force the OS that pages written by the checkpoint should be
- flushed to disk after a configurable number of bytes. Otherwise, these
- pages may be kept in the OS's page cache, inducing a stall when
- <code class="literal">fsync</code> is issued at the end of a checkpoint. This setting will
- often help to reduce transaction latency, but it also can have an adverse
- effect on performance; particularly for workloads that are bigger than
- <a class="xref" href="runtime-config-resource.html#GUC-SHARED-BUFFERS">shared_buffers</a>, but smaller than the OS's page cache.
- </p><p>
- The number of WAL segment files in <code class="filename">pg_wal</code> directory depends on
- <code class="varname">min_wal_size</code>, <code class="varname">max_wal_size</code> and
- the amount of WAL generated in previous checkpoint cycles. When old log
- segment files are no longer needed, they are removed or recycled (that is,
- renamed to become future segments in the numbered sequence). If, due to a
- short-term peak of log output rate, <code class="varname">max_wal_size</code> is
- exceeded, the unneeded segment files will be removed until the system
- gets back under this limit. Below that limit, the system recycles enough
- WAL files to cover the estimated need until the next checkpoint, and
- removes the rest. The estimate is based on a moving average of the number
- of WAL files used in previous checkpoint cycles. The moving average
- is increased immediately if the actual usage exceeds the estimate, so it
- accommodates peak usage rather than average usage to some extent.
- <code class="varname">min_wal_size</code> puts a minimum on the amount of WAL files
- recycled for future usage; that much WAL is always recycled for future use,
- even if the system is idle and the WAL usage estimate suggests that little
- WAL is needed.
- </p><p>
- Independently of <code class="varname">max_wal_size</code>,
- <a class="xref" href="runtime-config-replication.html#GUC-WAL-KEEP-SEGMENTS">wal_keep_segments</a> + 1 most recent WAL files are
- kept at all times. Also, if WAL archiving is used, old segments can not be
- removed or recycled until they are archived. If WAL archiving cannot keep up
- with the pace that WAL is generated, or if <code class="varname">archive_command</code>
- fails repeatedly, old WAL files will accumulate in <code class="filename">pg_wal</code>
- until the situation is resolved. A slow or failed standby server that
- uses a replication slot will have the same effect (see
- <a class="xref" href="warm-standby.html#STREAMING-REPLICATION-SLOTS" title="26.2.6. Replication Slots">Section 26.2.6</a>).
- </p><p>
- In archive recovery or standby mode, the server periodically performs
- <em class="firstterm">restartpoints</em>,<a id="id-1.6.16.6.12.2" class="indexterm"></a>
- which are similar to checkpoints in normal operation: the server forces
- all its state to disk, updates the <code class="filename">pg_control</code> file to
- indicate that the already-processed WAL data need not be scanned again,
- and then recycles any old log segment files in the <code class="filename">pg_wal</code>
- directory.
- Restartpoints can't be performed more frequently than checkpoints in the
- master because restartpoints can only be performed at checkpoint records.
- A restartpoint is triggered when a checkpoint record is reached if at
- least <code class="varname">checkpoint_timeout</code> seconds have passed since the last
- restartpoint, or if WAL size is about to exceed
- <code class="varname">max_wal_size</code>. However, because of limitations on when a
- restartpoint can be performed, <code class="varname">max_wal_size</code> is often exceeded
- during recovery, by up to one checkpoint cycle's worth of WAL.
- (<code class="varname">max_wal_size</code> is never a hard limit anyway, so you should
- always leave plenty of headroom to avoid running out of disk space.)
- </p><p>
- There are two commonly used internal <acronym class="acronym">WAL</acronym> functions:
- <code class="function">XLogInsertRecord</code> and <code class="function">XLogFlush</code>.
- <code class="function">XLogInsertRecord</code> is used to place a new record into
- the <acronym class="acronym">WAL</acronym> buffers in shared memory. If there is no
- space for the new record, <code class="function">XLogInsertRecord</code> will have
- to write (move to kernel cache) a few filled <acronym class="acronym">WAL</acronym>
- buffers. This is undesirable because <code class="function">XLogInsertRecord</code>
- is used on every database low level modification (for example, row
- insertion) at a time when an exclusive lock is held on affected
- data pages, so the operation needs to be as fast as possible. What
- is worse, writing <acronym class="acronym">WAL</acronym> buffers might also force the
- creation of a new log segment, which takes even more
- time. Normally, <acronym class="acronym">WAL</acronym> buffers should be written
- and flushed by an <code class="function">XLogFlush</code> request, which is
- made, for the most part, at transaction commit time to ensure that
- transaction records are flushed to permanent storage. On systems
- with high log output, <code class="function">XLogFlush</code> requests might
- not occur often enough to prevent <code class="function">XLogInsertRecord</code>
- from having to do writes. On such systems
- one should increase the number of <acronym class="acronym">WAL</acronym> buffers by
- modifying the <a class="xref" href="runtime-config-wal.html#GUC-WAL-BUFFERS">wal_buffers</a> parameter. When
- <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> is set and the system is very busy,
- setting <code class="varname">wal_buffers</code> higher will help smooth response times
- during the period immediately following each checkpoint.
- </p><p>
- The <a class="xref" href="runtime-config-wal.html#GUC-COMMIT-DELAY">commit_delay</a> parameter defines for how many
- microseconds a group commit leader process will sleep after acquiring a
- lock within <code class="function">XLogFlush</code>, while group commit
- followers queue up behind the leader. This delay allows other server
- processes to add their commit records to the WAL buffers so that all of
- them will be flushed by the leader's eventual sync operation. No sleep
- will occur if <a class="xref" href="runtime-config-wal.html#GUC-FSYNC">fsync</a> is not enabled, or if fewer
- than <a class="xref" href="runtime-config-wal.html#GUC-COMMIT-SIBLINGS">commit_siblings</a> other sessions are currently
- in active transactions; this avoids sleeping when it's unlikely that
- any other session will commit soon. Note that on some platforms, the
- resolution of a sleep request is ten milliseconds, so that any nonzero
- <code class="varname">commit_delay</code> setting between 1 and 10000
- microseconds would have the same effect. Note also that on some
- platforms, sleep operations may take slightly longer than requested by
- the parameter.
- </p><p>
- Since the purpose of <code class="varname">commit_delay</code> is to allow the
- cost of each flush operation to be amortized across concurrently
- committing transactions (potentially at the expense of transaction
- latency), it is necessary to quantify that cost before the setting can
- be chosen intelligently. The higher that cost is, the more effective
- <code class="varname">commit_delay</code> is expected to be in increasing
- transaction throughput, up to a point. The <a class="xref" href="pgtestfsync.html" title="pg_test_fsync"><span class="refentrytitle"><span class="application">pg_test_fsync</span></span></a> program can be used to measure the average time
- in microseconds that a single WAL flush operation takes. A value of
- half of the average time the program reports it takes to flush after a
- single 8kB write operation is often the most effective setting for
- <code class="varname">commit_delay</code>, so this value is recommended as the
- starting point to use when optimizing for a particular workload. While
- tuning <code class="varname">commit_delay</code> is particularly useful when the
- WAL log is stored on high-latency rotating disks, benefits can be
- significant even on storage media with very fast sync times, such as
- solid-state drives or RAID arrays with a battery-backed write cache;
- but this should definitely be tested against a representative workload.
- Higher values of <code class="varname">commit_siblings</code> should be used in
- such cases, whereas smaller <code class="varname">commit_siblings</code> values
- are often helpful on higher latency media. Note that it is quite
- possible that a setting of <code class="varname">commit_delay</code> that is too
- high can increase transaction latency by so much that total transaction
- throughput suffers.
- </p><p>
- When <code class="varname">commit_delay</code> is set to zero (the default), it
- is still possible for a form of group commit to occur, but each group
- will consist only of sessions that reach the point where they need to
- flush their commit records during the window in which the previous
- flush operation (if any) is occurring. At higher client counts a
- <span class="quote">“<span class="quote">gangway effect</span>”</span> tends to occur, so that the effects of group
- commit become significant even when <code class="varname">commit_delay</code> is
- zero, and thus explicitly setting <code class="varname">commit_delay</code> tends
- to help less. Setting <code class="varname">commit_delay</code> can only help
- when (1) there are some concurrently committing transactions, and (2)
- throughput is limited to some degree by commit rate; but with high
- rotational latency this setting can be effective in increasing
- transaction throughput with as few as two clients (that is, a single
- committing client with one sibling transaction).
- </p><p>
- The <a class="xref" href="runtime-config-wal.html#GUC-WAL-SYNC-METHOD">wal_sync_method</a> parameter determines how
- <span class="productname">PostgreSQL</span> will ask the kernel to force
- <acronym class="acronym">WAL</acronym> updates out to disk.
- All the options should be the same in terms of reliability, with
- the exception of <code class="literal">fsync_writethrough</code>, which can sometimes
- force a flush of the disk cache even when other options do not do so.
- However, it's quite platform-specific which one will be the fastest.
- You can test the speeds of different options using the <a class="xref" href="pgtestfsync.html" title="pg_test_fsync"><span class="refentrytitle"><span class="application">pg_test_fsync</span></span></a> program.
- Note that this parameter is irrelevant if <code class="varname">fsync</code>
- has been turned off.
- </p><p>
- Enabling the <a class="xref" href="runtime-config-developer.html#GUC-WAL-DEBUG">wal_debug</a> configuration parameter
- (provided that <span class="productname">PostgreSQL</span> has been
- compiled with support for it) will result in each
- <code class="function">XLogInsertRecord</code> and <code class="function">XLogFlush</code>
- <acronym class="acronym">WAL</acronym> call being logged to the server log. This
- option might be replaced by a more general mechanism in the future.
- </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="wal-async-commit.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="wal.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="wal-internals.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">29.3. Asynchronous Commit </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 29.5. WAL Internals</td></tr></table></div></body></html>
|