gooderp18绿色标准版
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

249 lines
20KB

  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>29.4. WAL Configuration</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="wal-async-commit.html" title="29.3. Asynchronous Commit" /><link rel="next" href="wal-internals.html" title="29.5. WAL Internals" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">29.4. <acronym xmlns="http://www.w3.org/1999/xhtml" class="acronym">WAL</acronym> Configuration</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="wal-async-commit.html" title="29.3. Asynchronous Commit">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="wal.html" title="Chapter 29. Reliability and the Write-Ahead Log">Up</a></td><th width="60%" align="center">Chapter 29. Reliability and the Write-Ahead Log</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="wal-internals.html" title="29.5. WAL Internals">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="WAL-CONFIGURATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">29.4. <acronym class="acronym">WAL</acronym> Configuration</h2></div></div></div><p>
  3. There are several <acronym class="acronym">WAL</acronym>-related configuration parameters that
  4. affect database performance. This section explains their use.
  5. Consult <a class="xref" href="runtime-config.html" title="Chapter 19. Server Configuration">Chapter 19</a> for general information about
  6. setting server configuration parameters.
  7. </p><p>
  8. <em class="firstterm">Checkpoints</em><a id="id-1.6.16.6.3.2" class="indexterm"></a>
  9. are points in the sequence of transactions at which it is guaranteed
  10. that the heap and index data files have been updated with all
  11. information written before that checkpoint. At checkpoint time, all
  12. dirty data pages are flushed to disk and a special checkpoint record is
  13. written to the log file. (The change records were previously flushed
  14. to the <acronym class="acronym">WAL</acronym> files.)
  15. In the event of a crash, the crash recovery procedure looks at the latest
  16. checkpoint record to determine the point in the log (known as the redo
  17. record) from which it should start the REDO operation. Any changes made to
  18. data files before that point are guaranteed to be already on disk.
  19. Hence, after a checkpoint, log segments preceding the one containing
  20. the redo record are no longer needed and can be recycled or removed. (When
  21. <acronym class="acronym">WAL</acronym> archiving is being done, the log segments must be
  22. archived before being recycled or removed.)
  23. </p><p>
  24. The checkpoint requirement of flushing all dirty data pages to disk
  25. can cause a significant I/O load. For this reason, checkpoint
  26. activity is throttled so that I/O begins at checkpoint start and completes
  27. before the next checkpoint is due to start; this minimizes performance
  28. degradation during checkpoints.
  29. </p><p>
  30. The server's checkpointer process automatically performs
  31. a checkpoint every so often. A checkpoint is begun every <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-TIMEOUT">checkpoint_timeout</a> seconds, or if
  32. <a class="xref" href="runtime-config-wal.html#GUC-MAX-WAL-SIZE">max_wal_size</a> is about to be exceeded,
  33. whichever comes first.
  34. The default settings are 5 minutes and 1 GB, respectively.
  35. If no WAL has been written since the previous checkpoint, new checkpoints
  36. will be skipped even if <code class="varname">checkpoint_timeout</code> has passed.
  37. (If WAL archiving is being used and you want to put a lower limit on how
  38. often files are archived in order to bound potential data loss, you should
  39. adjust the <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT">archive_timeout</a> parameter rather than the
  40. checkpoint parameters.)
  41. It is also possible to force a checkpoint by using the SQL
  42. command <code class="command">CHECKPOINT</code>.
  43. </p><p>
  44. Reducing <code class="varname">checkpoint_timeout</code> and/or
  45. <code class="varname">max_wal_size</code> causes checkpoints to occur
  46. more often. This allows faster after-crash recovery, since less work
  47. will need to be redone. However, one must balance this against the
  48. increased cost of flushing dirty data pages more often. If
  49. <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> is set (as is the default), there is
  50. another factor to consider. To ensure data page consistency,
  51. the first modification of a data page after each checkpoint results in
  52. logging the entire page content. In that case,
  53. a smaller checkpoint interval increases the volume of output to the WAL log,
  54. partially negating the goal of using a smaller interval,
  55. and in any case causing more disk I/O.
  56. </p><p>
  57. Checkpoints are fairly expensive, first because they require writing
  58. out all currently dirty buffers, and second because they result in
  59. extra subsequent WAL traffic as discussed above. It is therefore
  60. wise to set the checkpointing parameters high enough so that checkpoints
  61. don't happen too often. As a simple sanity check on your checkpointing
  62. parameters, you can set the <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-WARNING">checkpoint_warning</a>
  63. parameter. If checkpoints happen closer together than
  64. <code class="varname">checkpoint_warning</code> seconds,
  65. a message will be output to the server log recommending increasing
  66. <code class="varname">max_wal_size</code>. Occasional appearance of such
  67. a message is not cause for alarm, but if it appears often then the
  68. checkpoint control parameters should be increased. Bulk operations such
  69. as large <code class="command">COPY</code> transfers might cause a number of such warnings
  70. to appear if you have not set <code class="varname">max_wal_size</code> high
  71. enough.
  72. </p><p>
  73. To avoid flooding the I/O system with a burst of page writes,
  74. writing dirty buffers during a checkpoint is spread over a period of time.
  75. That period is controlled by
  76. <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-COMPLETION-TARGET">checkpoint_completion_target</a>, which is
  77. given as a fraction of the checkpoint interval.
  78. The I/O rate is adjusted so that the checkpoint finishes when the
  79. given fraction of
  80. <code class="varname">checkpoint_timeout</code> seconds have elapsed, or before
  81. <code class="varname">max_wal_size</code> is exceeded, whichever is sooner.
  82. With the default value of 0.5,
  83. <span class="productname">PostgreSQL</span> can be expected to complete each checkpoint
  84. in about half the time before the next checkpoint starts. On a system
  85. that's very close to maximum I/O throughput during normal operation,
  86. you might want to increase <code class="varname">checkpoint_completion_target</code>
  87. to reduce the I/O load from checkpoints. The disadvantage of this is that
  88. prolonging checkpoints affects recovery time, because more WAL segments
  89. will need to be kept around for possible use in recovery. Although
  90. <code class="varname">checkpoint_completion_target</code> can be set as high as 1.0,
  91. it is best to keep it less than that (perhaps 0.9 at most) since
  92. checkpoints include some other activities besides writing dirty buffers.
  93. A setting of 1.0 is quite likely to result in checkpoints not being
  94. completed on time, which would result in performance loss due to
  95. unexpected variation in the number of WAL segments needed.
  96. </p><p>
  97. On Linux and POSIX platforms <a class="xref" href="runtime-config-wal.html#GUC-CHECKPOINT-FLUSH-AFTER">checkpoint_flush_after</a>
  98. allows to force the OS that pages written by the checkpoint should be
  99. flushed to disk after a configurable number of bytes. Otherwise, these
  100. pages may be kept in the OS's page cache, inducing a stall when
  101. <code class="literal">fsync</code> is issued at the end of a checkpoint. This setting will
  102. often help to reduce transaction latency, but it also can have an adverse
  103. effect on performance; particularly for workloads that are bigger than
  104. <a class="xref" href="runtime-config-resource.html#GUC-SHARED-BUFFERS">shared_buffers</a>, but smaller than the OS's page cache.
  105. </p><p>
  106. The number of WAL segment files in <code class="filename">pg_wal</code> directory depends on
  107. <code class="varname">min_wal_size</code>, <code class="varname">max_wal_size</code> and
  108. the amount of WAL generated in previous checkpoint cycles. When old log
  109. segment files are no longer needed, they are removed or recycled (that is,
  110. renamed to become future segments in the numbered sequence). If, due to a
  111. short-term peak of log output rate, <code class="varname">max_wal_size</code> is
  112. exceeded, the unneeded segment files will be removed until the system
  113. gets back under this limit. Below that limit, the system recycles enough
  114. WAL files to cover the estimated need until the next checkpoint, and
  115. removes the rest. The estimate is based on a moving average of the number
  116. of WAL files used in previous checkpoint cycles. The moving average
  117. is increased immediately if the actual usage exceeds the estimate, so it
  118. accommodates peak usage rather than average usage to some extent.
  119. <code class="varname">min_wal_size</code> puts a minimum on the amount of WAL files
  120. recycled for future usage; that much WAL is always recycled for future use,
  121. even if the system is idle and the WAL usage estimate suggests that little
  122. WAL is needed.
  123. </p><p>
  124. Independently of <code class="varname">max_wal_size</code>,
  125. <a class="xref" href="runtime-config-replication.html#GUC-WAL-KEEP-SEGMENTS">wal_keep_segments</a> + 1 most recent WAL files are
  126. kept at all times. Also, if WAL archiving is used, old segments can not be
  127. removed or recycled until they are archived. If WAL archiving cannot keep up
  128. with the pace that WAL is generated, or if <code class="varname">archive_command</code>
  129. fails repeatedly, old WAL files will accumulate in <code class="filename">pg_wal</code>
  130. until the situation is resolved. A slow or failed standby server that
  131. uses a replication slot will have the same effect (see
  132. <a class="xref" href="warm-standby.html#STREAMING-REPLICATION-SLOTS" title="26.2.6. Replication Slots">Section 26.2.6</a>).
  133. </p><p>
  134. In archive recovery or standby mode, the server periodically performs
  135. <em class="firstterm">restartpoints</em>,<a id="id-1.6.16.6.12.2" class="indexterm"></a>
  136. which are similar to checkpoints in normal operation: the server forces
  137. all its state to disk, updates the <code class="filename">pg_control</code> file to
  138. indicate that the already-processed WAL data need not be scanned again,
  139. and then recycles any old log segment files in the <code class="filename">pg_wal</code>
  140. directory.
  141. Restartpoints can't be performed more frequently than checkpoints in the
  142. master because restartpoints can only be performed at checkpoint records.
  143. A restartpoint is triggered when a checkpoint record is reached if at
  144. least <code class="varname">checkpoint_timeout</code> seconds have passed since the last
  145. restartpoint, or if WAL size is about to exceed
  146. <code class="varname">max_wal_size</code>. However, because of limitations on when a
  147. restartpoint can be performed, <code class="varname">max_wal_size</code> is often exceeded
  148. during recovery, by up to one checkpoint cycle's worth of WAL.
  149. (<code class="varname">max_wal_size</code> is never a hard limit anyway, so you should
  150. always leave plenty of headroom to avoid running out of disk space.)
  151. </p><p>
  152. There are two commonly used internal <acronym class="acronym">WAL</acronym> functions:
  153. <code class="function">XLogInsertRecord</code> and <code class="function">XLogFlush</code>.
  154. <code class="function">XLogInsertRecord</code> is used to place a new record into
  155. the <acronym class="acronym">WAL</acronym> buffers in shared memory. If there is no
  156. space for the new record, <code class="function">XLogInsertRecord</code> will have
  157. to write (move to kernel cache) a few filled <acronym class="acronym">WAL</acronym>
  158. buffers. This is undesirable because <code class="function">XLogInsertRecord</code>
  159. is used on every database low level modification (for example, row
  160. insertion) at a time when an exclusive lock is held on affected
  161. data pages, so the operation needs to be as fast as possible. What
  162. is worse, writing <acronym class="acronym">WAL</acronym> buffers might also force the
  163. creation of a new log segment, which takes even more
  164. time. Normally, <acronym class="acronym">WAL</acronym> buffers should be written
  165. and flushed by an <code class="function">XLogFlush</code> request, which is
  166. made, for the most part, at transaction commit time to ensure that
  167. transaction records are flushed to permanent storage. On systems
  168. with high log output, <code class="function">XLogFlush</code> requests might
  169. not occur often enough to prevent <code class="function">XLogInsertRecord</code>
  170. from having to do writes. On such systems
  171. one should increase the number of <acronym class="acronym">WAL</acronym> buffers by
  172. modifying the <a class="xref" href="runtime-config-wal.html#GUC-WAL-BUFFERS">wal_buffers</a> parameter. When
  173. <a class="xref" href="runtime-config-wal.html#GUC-FULL-PAGE-WRITES">full_page_writes</a> is set and the system is very busy,
  174. setting <code class="varname">wal_buffers</code> higher will help smooth response times
  175. during the period immediately following each checkpoint.
  176. </p><p>
  177. The <a class="xref" href="runtime-config-wal.html#GUC-COMMIT-DELAY">commit_delay</a> parameter defines for how many
  178. microseconds a group commit leader process will sleep after acquiring a
  179. lock within <code class="function">XLogFlush</code>, while group commit
  180. followers queue up behind the leader. This delay allows other server
  181. processes to add their commit records to the WAL buffers so that all of
  182. them will be flushed by the leader's eventual sync operation. No sleep
  183. will occur if <a class="xref" href="runtime-config-wal.html#GUC-FSYNC">fsync</a> is not enabled, or if fewer
  184. than <a class="xref" href="runtime-config-wal.html#GUC-COMMIT-SIBLINGS">commit_siblings</a> other sessions are currently
  185. in active transactions; this avoids sleeping when it's unlikely that
  186. any other session will commit soon. Note that on some platforms, the
  187. resolution of a sleep request is ten milliseconds, so that any nonzero
  188. <code class="varname">commit_delay</code> setting between 1 and 10000
  189. microseconds would have the same effect. Note also that on some
  190. platforms, sleep operations may take slightly longer than requested by
  191. the parameter.
  192. </p><p>
  193. Since the purpose of <code class="varname">commit_delay</code> is to allow the
  194. cost of each flush operation to be amortized across concurrently
  195. committing transactions (potentially at the expense of transaction
  196. latency), it is necessary to quantify that cost before the setting can
  197. be chosen intelligently. The higher that cost is, the more effective
  198. <code class="varname">commit_delay</code> is expected to be in increasing
  199. transaction throughput, up to a point. The <a class="xref" href="pgtestfsync.html" title="pg_test_fsync"><span class="refentrytitle"><span class="application">pg_test_fsync</span></span></a> program can be used to measure the average time
  200. in microseconds that a single WAL flush operation takes. A value of
  201. half of the average time the program reports it takes to flush after a
  202. single 8kB write operation is often the most effective setting for
  203. <code class="varname">commit_delay</code>, so this value is recommended as the
  204. starting point to use when optimizing for a particular workload. While
  205. tuning <code class="varname">commit_delay</code> is particularly useful when the
  206. WAL log is stored on high-latency rotating disks, benefits can be
  207. significant even on storage media with very fast sync times, such as
  208. solid-state drives or RAID arrays with a battery-backed write cache;
  209. but this should definitely be tested against a representative workload.
  210. Higher values of <code class="varname">commit_siblings</code> should be used in
  211. such cases, whereas smaller <code class="varname">commit_siblings</code> values
  212. are often helpful on higher latency media. Note that it is quite
  213. possible that a setting of <code class="varname">commit_delay</code> that is too
  214. high can increase transaction latency by so much that total transaction
  215. throughput suffers.
  216. </p><p>
  217. When <code class="varname">commit_delay</code> is set to zero (the default), it
  218. is still possible for a form of group commit to occur, but each group
  219. will consist only of sessions that reach the point where they need to
  220. flush their commit records during the window in which the previous
  221. flush operation (if any) is occurring. At higher client counts a
  222. <span class="quote">“<span class="quote">gangway effect</span>”</span> tends to occur, so that the effects of group
  223. commit become significant even when <code class="varname">commit_delay</code> is
  224. zero, and thus explicitly setting <code class="varname">commit_delay</code> tends
  225. to help less. Setting <code class="varname">commit_delay</code> can only help
  226. when (1) there are some concurrently committing transactions, and (2)
  227. throughput is limited to some degree by commit rate; but with high
  228. rotational latency this setting can be effective in increasing
  229. transaction throughput with as few as two clients (that is, a single
  230. committing client with one sibling transaction).
  231. </p><p>
  232. The <a class="xref" href="runtime-config-wal.html#GUC-WAL-SYNC-METHOD">wal_sync_method</a> parameter determines how
  233. <span class="productname">PostgreSQL</span> will ask the kernel to force
  234. <acronym class="acronym">WAL</acronym> updates out to disk.
  235. All the options should be the same in terms of reliability, with
  236. the exception of <code class="literal">fsync_writethrough</code>, which can sometimes
  237. force a flush of the disk cache even when other options do not do so.
  238. However, it's quite platform-specific which one will be the fastest.
  239. You can test the speeds of different options using the <a class="xref" href="pgtestfsync.html" title="pg_test_fsync"><span class="refentrytitle"><span class="application">pg_test_fsync</span></span></a> program.
  240. Note that this parameter is irrelevant if <code class="varname">fsync</code>
  241. has been turned off.
  242. </p><p>
  243. Enabling the <a class="xref" href="runtime-config-developer.html#GUC-WAL-DEBUG">wal_debug</a> configuration parameter
  244. (provided that <span class="productname">PostgreSQL</span> has been
  245. compiled with support for it) will result in each
  246. <code class="function">XLogInsertRecord</code> and <code class="function">XLogFlush</code>
  247. <acronym class="acronym">WAL</acronym> call being logged to the server log. This
  248. option might be replaced by a more general mechanism in the future.
  249. </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="wal-async-commit.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="wal.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="wal-internals.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">29.3. Asynchronous Commit </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 29.5. WAL Internals</td></tr></table></div></body></html>
上海开阖软件有限公司 沪ICP备12045867号-1