gooderp18绿色标准版
您最多选择25个主题 主题必须以字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符

645 行
49KB

  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>26.2. Log-Shipping Standby Servers</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="different-replication-solutions.html" title="26.1. Comparison of Different Solutions" /><link rel="next" href="warm-standby-failover.html" title="26.3. Failover" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">26.2. Log-Shipping Standby Servers</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="different-replication-solutions.html" title="26.1. Comparison of Different Solutions">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="high-availability.html" title="Chapter 26. High Availability, Load Balancing, and Replication">Up</a></td><th width="60%" align="center">Chapter 26. High Availability, Load Balancing, and Replication</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="warm-standby-failover.html" title="26.3. Failover">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="WARM-STANDBY"><div class="titlepage"><div><div><h2 class="title" style="clear: both">26.2. Log-Shipping Standby Servers</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="warm-standby.html#STANDBY-PLANNING">26.2.1. Planning</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#STANDBY-SERVER-OPERATION">26.2.2. Standby Server Operation</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#PREPARING-MASTER-FOR-STANDBY">26.2.3. Preparing the Master for Standby Servers</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#STANDBY-SERVER-SETUP">26.2.4. Setting Up a Standby Server</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#STREAMING-REPLICATION">26.2.5. Streaming Replication</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#STREAMING-REPLICATION-SLOTS">26.2.6. Replication Slots</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#CASCADING-REPLICATION">26.2.7. Cascading Replication</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#SYNCHRONOUS-REPLICATION">26.2.8. Synchronous Replication</a></span></dt><dt><span class="sect2"><a href="warm-standby.html#CONTINUOUS-ARCHIVING-IN-STANDBY">26.2.9. Continuous Archiving in Standby</a></span></dt></dl></div><p>
  3. Continuous archiving can be used to create a <em class="firstterm">high
  4. availability</em> (HA) cluster configuration with one or more
  5. <em class="firstterm">standby servers</em> ready to take over operations if the
  6. primary server fails. This capability is widely referred to as
  7. <em class="firstterm">warm standby</em> or <em class="firstterm">log shipping</em>.
  8. </p><p>
  9. The primary and standby server work together to provide this capability,
  10. though the servers are only loosely coupled. The primary server operates
  11. in continuous archiving mode, while each standby server operates in
  12. continuous recovery mode, reading the WAL files from the primary. No
  13. changes to the database tables are required to enable this capability,
  14. so it offers low administration overhead compared to some other
  15. replication solutions. This configuration also has relatively low
  16. performance impact on the primary server.
  17. </p><p>
  18. Directly moving WAL records from one database server to another
  19. is typically described as log shipping. <span class="productname">PostgreSQL</span>
  20. implements file-based log shipping by transferring WAL records
  21. one file (WAL segment) at a time. WAL files (16MB) can be
  22. shipped easily and cheaply over any distance, whether it be to an
  23. adjacent system, another system at the same site, or another system on
  24. the far side of the globe. The bandwidth required for this technique
  25. varies according to the transaction rate of the primary server.
  26. Record-based log shipping is more granular and streams WAL changes
  27. incrementally over a network connection (see <a class="xref" href="warm-standby.html#STREAMING-REPLICATION" title="26.2.5. Streaming Replication">Section 26.2.5</a>).
  28. </p><p>
  29. It should be noted that log shipping is asynchronous, i.e., the WAL
  30. records are shipped after transaction commit. As a result, there is a
  31. window for data loss should the primary server suffer a catastrophic
  32. failure; transactions not yet shipped will be lost. The size of the
  33. data loss window in file-based log shipping can be limited by use of the
  34. <code class="varname">archive_timeout</code> parameter, which can be set as low
  35. as a few seconds. However such a low setting will
  36. substantially increase the bandwidth required for file shipping.
  37. Streaming replication (see <a class="xref" href="warm-standby.html#STREAMING-REPLICATION" title="26.2.5. Streaming Replication">Section 26.2.5</a>)
  38. allows a much smaller window of data loss.
  39. </p><p>
  40. Recovery performance is sufficiently good that the standby will
  41. typically be only moments away from full
  42. availability once it has been activated. As a result, this is called
  43. a warm standby configuration which offers high
  44. availability. Restoring a server from an archived base backup and
  45. rollforward will take considerably longer, so that technique only
  46. offers a solution for disaster recovery, not high availability.
  47. A standby server can also be used for read-only queries, in which case
  48. it is called a Hot Standby server. See <a class="xref" href="hot-standby.html" title="26.5. Hot Standby">Section 26.5</a> for
  49. more information.
  50. </p><a id="id-1.6.13.16.7" class="indexterm"></a><a id="id-1.6.13.16.8" class="indexterm"></a><a id="id-1.6.13.16.9" class="indexterm"></a><a id="id-1.6.13.16.10" class="indexterm"></a><a id="id-1.6.13.16.11" class="indexterm"></a><a id="id-1.6.13.16.12" class="indexterm"></a><div class="sect2" id="STANDBY-PLANNING"><div class="titlepage"><div><div><h3 class="title">26.2.1. Planning</h3></div></div></div><p>
  51. It is usually wise to create the primary and standby servers
  52. so that they are as similar as possible, at least from the
  53. perspective of the database server. In particular, the path names
  54. associated with tablespaces will be passed across unmodified, so both
  55. primary and standby servers must have the same mount paths for
  56. tablespaces if that feature is used. Keep in mind that if
  57. <a class="xref" href="sql-createtablespace.html" title="CREATE TABLESPACE"><span class="refentrytitle">CREATE TABLESPACE</span></a>
  58. is executed on the primary, any new mount point needed for it must
  59. be created on the primary and all standby servers before the command
  60. is executed. Hardware need not be exactly the same, but experience shows
  61. that maintaining two identical systems is easier than maintaining two
  62. dissimilar ones over the lifetime of the application and system.
  63. In any case the hardware architecture must be the same — shipping
  64. from, say, a 32-bit to a 64-bit system will not work.
  65. </p><p>
  66. In general, log shipping between servers running different major
  67. <span class="productname">PostgreSQL</span> release
  68. levels is not possible. It is the policy of the PostgreSQL Global
  69. Development Group not to make changes to disk formats during minor release
  70. upgrades, so it is likely that running different minor release levels
  71. on primary and standby servers will work successfully. However, no
  72. formal support for that is offered and you are advised to keep primary
  73. and standby servers at the same release level as much as possible.
  74. When updating to a new minor release, the safest policy is to update
  75. the standby servers first — a new minor release is more likely
  76. to be able to read WAL files from a previous minor release than vice
  77. versa.
  78. </p></div><div class="sect2" id="STANDBY-SERVER-OPERATION"><div class="titlepage"><div><div><h3 class="title">26.2.2. Standby Server Operation</h3></div></div></div><p>
  79. In standby mode, the server continuously applies WAL received from the
  80. master server. The standby server can read WAL from a WAL archive
  81. (see <a class="xref" href="runtime-config-wal.html#GUC-RESTORE-COMMAND">restore_command</a>) or directly from the master
  82. over a TCP connection (streaming replication). The standby server will
  83. also attempt to restore any WAL found in the standby cluster's
  84. <code class="filename">pg_wal</code> directory. That typically happens after a server
  85. restart, when the standby replays again WAL that was streamed from the
  86. master before the restart, but you can also manually copy files to
  87. <code class="filename">pg_wal</code> at any time to have them replayed.
  88. </p><p>
  89. At startup, the standby begins by restoring all WAL available in the
  90. archive location, calling <code class="varname">restore_command</code>. Once it
  91. reaches the end of WAL available there and <code class="varname">restore_command</code>
  92. fails, it tries to restore any WAL available in the <code class="filename">pg_wal</code> directory.
  93. If that fails, and streaming replication has been configured, the
  94. standby tries to connect to the primary server and start streaming WAL
  95. from the last valid record found in archive or <code class="filename">pg_wal</code>. If that fails
  96. or streaming replication is not configured, or if the connection is
  97. later disconnected, the standby goes back to step 1 and tries to
  98. restore the file from the archive again. This loop of retries from the
  99. archive, <code class="filename">pg_wal</code>, and via streaming replication goes on until the server
  100. is stopped or failover is triggered by a trigger file.
  101. </p><p>
  102. Standby mode is exited and the server switches to normal operation
  103. when <code class="command">pg_ctl promote</code> is run or a trigger file is found
  104. (<code class="varname">promote_trigger_file</code>). Before failover,
  105. any WAL immediately available in the archive or in <code class="filename">pg_wal</code> will be
  106. restored, but no attempt is made to connect to the master.
  107. </p></div><div class="sect2" id="PREPARING-MASTER-FOR-STANDBY"><div class="titlepage"><div><div><h3 class="title">26.2.3. Preparing the Master for Standby Servers</h3></div></div></div><p>
  108. Set up continuous archiving on the primary to an archive directory
  109. accessible from the standby, as described
  110. in <a class="xref" href="continuous-archiving.html" title="25.3. Continuous Archiving and Point-in-Time Recovery (PITR)">Section 25.3</a>. The archive location should be
  111. accessible from the standby even when the master is down, i.e. it should
  112. reside on the standby server itself or another trusted server, not on
  113. the master server.
  114. </p><p>
  115. If you want to use streaming replication, set up authentication on the
  116. primary server to allow replication connections from the standby
  117. server(s); that is, create a role and provide a suitable entry or
  118. entries in <code class="filename">pg_hba.conf</code> with the database field set to
  119. <code class="literal">replication</code>. Also ensure <code class="varname">max_wal_senders</code> is set
  120. to a sufficiently large value in the configuration file of the primary
  121. server. If replication slots will be used,
  122. ensure that <code class="varname">max_replication_slots</code> is set sufficiently
  123. high as well.
  124. </p><p>
  125. Take a base backup as described in <a class="xref" href="continuous-archiving.html#BACKUP-BASE-BACKUP" title="25.3.2. Making a Base Backup">Section 25.3.2</a>
  126. to bootstrap the standby server.
  127. </p></div><div class="sect2" id="STANDBY-SERVER-SETUP"><div class="titlepage"><div><div><h3 class="title">26.2.4. Setting Up a Standby Server</h3></div></div></div><p>
  128. To set up the standby server, restore the base backup taken from primary
  129. server (see <a class="xref" href="continuous-archiving.html#BACKUP-PITR-RECOVERY" title="25.3.4. Recovering Using a Continuous Archive Backup">Section 25.3.4</a>). Create a file
  130. <code class="filename">standby.signal</code> in the standby's cluster data
  131. directory. Set <a class="xref" href="runtime-config-wal.html#GUC-RESTORE-COMMAND">restore_command</a> to a simple command to copy files from
  132. the WAL archive. If you plan to have multiple standby servers for high
  133. availability purposes, make sure that <code class="varname">recovery_target_timeline</code> is set to
  134. <code class="literal">latest</code> (the default), to make the standby server follow the timeline change
  135. that occurs at failover to another standby.
  136. </p><div class="note"><h3 class="title">Note</h3><p>
  137. Do not use pg_standby or similar tools with the built-in standby mode
  138. described here. <a class="xref" href="runtime-config-wal.html#GUC-RESTORE-COMMAND">restore_command</a> should return immediately
  139. if the file does not exist; the server will retry the command again if
  140. necessary. See <a class="xref" href="log-shipping-alternative.html" title="26.4. Alternative Method for Log Shipping">Section 26.4</a>
  141. for using tools like pg_standby.
  142. </p></div><p>
  143. If you want to use streaming replication, fill in
  144. <a class="xref" href="runtime-config-replication.html#GUC-PRIMARY-CONNINFO">primary_conninfo</a> with a libpq connection string, including
  145. the host name (or IP address) and any additional details needed to
  146. connect to the primary server. If the primary needs a password for
  147. authentication, the password needs to be specified in
  148. <a class="xref" href="runtime-config-replication.html#GUC-PRIMARY-CONNINFO">primary_conninfo</a> as well.
  149. </p><p>
  150. If you're setting up the standby server for high availability purposes,
  151. set up WAL archiving, connections and authentication like the primary
  152. server, because the standby server will work as a primary server after
  153. failover.
  154. </p><p>
  155. If you're using a WAL archive, its size can be minimized using the <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-CLEANUP-COMMAND">archive_cleanup_command</a> parameter to remove files that are no
  156. longer required by the standby server.
  157. The <span class="application">pg_archivecleanup</span> utility is designed specifically to
  158. be used with <code class="varname">archive_cleanup_command</code> in typical single-standby
  159. configurations, see <a class="xref" href="pgarchivecleanup.html" title="pg_archivecleanup"><span class="refentrytitle"><span class="application">pg_archivecleanup</span></span></a>.
  160. Note however, that if you're using the archive for backup purposes, you
  161. need to retain files needed to recover from at least the latest base
  162. backup, even if they're no longer needed by the standby.
  163. </p><p>
  164. A simple example of configuration is:
  165. </p><pre class="programlisting">
  166. primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass options=''-c wal_sender_timeout=5000'''
  167. restore_command = 'cp /path/to/archive/%f %p'
  168. archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
  169. </pre><p>
  170. </p><p>
  171. You can have any number of standby servers, but if you use streaming
  172. replication, make sure you set <code class="varname">max_wal_senders</code> high enough in
  173. the primary to allow them to be connected simultaneously.
  174. </p></div><div class="sect2" id="STREAMING-REPLICATION"><div class="titlepage"><div><div><h3 class="title">26.2.5. Streaming Replication</h3></div></div></div><a id="id-1.6.13.16.17.2" class="indexterm"></a><p>
  175. Streaming replication allows a standby server to stay more up-to-date
  176. than is possible with file-based log shipping. The standby connects
  177. to the primary, which streams WAL records to the standby as they're
  178. generated, without waiting for the WAL file to be filled.
  179. </p><p>
  180. Streaming replication is asynchronous by default
  181. (see <a class="xref" href="warm-standby.html#SYNCHRONOUS-REPLICATION" title="26.2.8. Synchronous Replication">Section 26.2.8</a>), in which case there is
  182. a small delay between committing a transaction in the primary and the
  183. changes becoming visible in the standby. This delay is however much
  184. smaller than with file-based log shipping, typically under one second
  185. assuming the standby is powerful enough to keep up with the load. With
  186. streaming replication, <code class="varname">archive_timeout</code> is not required to
  187. reduce the data loss window.
  188. </p><p>
  189. If you use streaming replication without file-based continuous
  190. archiving, the server might recycle old WAL segments before the standby
  191. has received them. If this occurs, the standby will need to be
  192. reinitialized from a new base backup. You can avoid this by setting
  193. <code class="varname">wal_keep_segments</code> to a value large enough to ensure that
  194. WAL segments are not recycled too early, or by configuring a replication
  195. slot for the standby. If you set up a WAL archive that's accessible from
  196. the standby, these solutions are not required, since the standby can
  197. always use the archive to catch up provided it retains enough segments.
  198. </p><p>
  199. To use streaming replication, set up a file-based log-shipping standby
  200. server as described in <a class="xref" href="warm-standby.html" title="26.2. Log-Shipping Standby Servers">Section 26.2</a>. The step that
  201. turns a file-based log-shipping standby into streaming replication
  202. standby is setting the <code class="varname">primary_conninfo</code> setting
  203. to point to the primary server. Set
  204. <a class="xref" href="runtime-config-connection.html#GUC-LISTEN-ADDRESSES">listen_addresses</a> and authentication options
  205. (see <code class="filename">pg_hba.conf</code>) on the primary so that the standby server
  206. can connect to the <code class="literal">replication</code> pseudo-database on the primary
  207. server (see <a class="xref" href="warm-standby.html#STREAMING-REPLICATION-AUTHENTICATION" title="26.2.5.1. Authentication">Section 26.2.5.1</a>).
  208. </p><p>
  209. On systems that support the keepalive socket option, setting
  210. <a class="xref" href="runtime-config-connection.html#GUC-TCP-KEEPALIVES-IDLE">tcp_keepalives_idle</a>,
  211. <a class="xref" href="runtime-config-connection.html#GUC-TCP-KEEPALIVES-INTERVAL">tcp_keepalives_interval</a> and
  212. <a class="xref" href="runtime-config-connection.html#GUC-TCP-KEEPALIVES-COUNT">tcp_keepalives_count</a> helps the primary promptly
  213. notice a broken connection.
  214. </p><p>
  215. Set the maximum number of concurrent connections from the standby servers
  216. (see <a class="xref" href="runtime-config-replication.html#GUC-MAX-WAL-SENDERS">max_wal_senders</a> for details).
  217. </p><p>
  218. When the standby is started and <code class="varname">primary_conninfo</code> is set
  219. correctly, the standby will connect to the primary after replaying all
  220. WAL files available in the archive. If the connection is established
  221. successfully, you will see a walreceiver process in the standby, and
  222. a corresponding walsender process in the primary.
  223. </p><div class="sect3" id="STREAMING-REPLICATION-AUTHENTICATION"><div class="titlepage"><div><div><h4 class="title">26.2.5.1. Authentication</h4></div></div></div><p>
  224. It is very important that the access privileges for replication be set up
  225. so that only trusted users can read the WAL stream, because it is
  226. easy to extract privileged information from it. Standby servers must
  227. authenticate to the primary as a superuser or an account that has the
  228. <code class="literal">REPLICATION</code> privilege. It is recommended to create a
  229. dedicated user account with <code class="literal">REPLICATION</code> and <code class="literal">LOGIN</code>
  230. privileges for replication. While <code class="literal">REPLICATION</code> privilege gives
  231. very high permissions, it does not allow the user to modify any data on
  232. the primary system, which the <code class="literal">SUPERUSER</code> privilege does.
  233. </p><p>
  234. Client authentication for replication is controlled by a
  235. <code class="filename">pg_hba.conf</code> record specifying <code class="literal">replication</code> in the
  236. <em class="replaceable"><code>database</code></em> field. For example, if the standby is running on
  237. host IP <code class="literal">192.168.1.100</code> and the account name for replication
  238. is <code class="literal">foo</code>, the administrator can add the following line to the
  239. <code class="filename">pg_hba.conf</code> file on the primary:
  240. </p><pre class="programlisting">
  241. # Allow the user "foo" from host 192.168.1.100 to connect to the primary
  242. # as a replication standby if the user's password is correctly supplied.
  243. #
  244. # TYPE DATABASE USER ADDRESS METHOD
  245. host replication foo 192.168.1.100/32 md5
  246. </pre><p>
  247. </p><p>
  248. The host name and port number of the primary, connection user name,
  249. and password are specified in the <a class="xref" href="runtime-config-replication.html#GUC-PRIMARY-CONNINFO">primary_conninfo</a>.
  250. The password can also be set in the <code class="filename">~/.pgpass</code> file on the
  251. standby (specify <code class="literal">replication</code> in the <em class="replaceable"><code>database</code></em>
  252. field).
  253. For example, if the primary is running on host IP <code class="literal">192.168.1.50</code>,
  254. port <code class="literal">5432</code>, the account name for replication is
  255. <code class="literal">foo</code>, and the password is <code class="literal">foopass</code>, the administrator
  256. can add the following line to the <code class="filename">postgresql.conf</code> file on the
  257. standby:
  258. </p><pre class="programlisting">
  259. # The standby connects to the primary that is running on host 192.168.1.50
  260. # and port 5432 as the user "foo" whose password is "foopass".
  261. primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
  262. </pre><p>
  263. </p></div><div class="sect3" id="STREAMING-REPLICATION-MONITORING"><div class="titlepage"><div><div><h4 class="title">26.2.5.2. Monitoring</h4></div></div></div><p>
  264. An important health indicator of streaming replication is the amount
  265. of WAL records generated in the primary, but not yet applied in the
  266. standby. You can calculate this lag by comparing the current WAL write
  267. location on the primary with the last WAL location received by the
  268. standby. These locations can be retrieved using
  269. <code class="function">pg_current_wal_lsn</code> on the primary and
  270. <code class="function">pg_last_wal_receive_lsn</code> on the standby,
  271. respectively (see <a class="xref" href="functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE" title="Table 9.84. Backup Control Functions">Table 9.84</a> and
  272. <a class="xref" href="functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE" title="Table 9.85. Recovery Information Functions">Table 9.85</a> for details).
  273. The last WAL receive location in the standby is also displayed in the
  274. process status of the WAL receiver process, displayed using the
  275. <code class="command">ps</code> command (see <a class="xref" href="monitoring-ps.html" title="27.1. Standard Unix Tools">Section 27.1</a> for details).
  276. </p><p>
  277. You can retrieve a list of WAL sender processes via the
  278. <a class="xref" href="monitoring-stats.html#PG-STAT-REPLICATION-VIEW" title="Table 27.5. pg_stat_replication View">pg_stat_replication</a> view. Large differences between
  279. <code class="function">pg_current_wal_lsn</code> and the view's <code class="literal">sent_lsn</code> field
  280. might indicate that the master server is under heavy load, while
  281. differences between <code class="literal">sent_lsn</code> and
  282. <code class="function">pg_last_wal_receive_lsn</code> on the standby might indicate
  283. network delay, or that the standby is under heavy load.
  284. </p><p>
  285. On a hot standby, the status of the WAL receiver process can be retrieved
  286. via the <a class="xref" href="monitoring-stats.html#PG-STAT-WAL-RECEIVER-VIEW" title="Table 27.6. pg_stat_wal_receiver View">pg_stat_wal_receiver</a> view. A large
  287. difference between <code class="function">pg_last_wal_replay_lsn</code> and the
  288. view's <code class="literal">received_lsn</code> indicates that WAL is being
  289. received faster than it can be replayed.
  290. </p></div></div><div class="sect2" id="STREAMING-REPLICATION-SLOTS"><div class="titlepage"><div><div><h3 class="title">26.2.6. Replication Slots</h3></div></div></div><a id="id-1.6.13.16.18.2" class="indexterm"></a><p>
  291. Replication slots provide an automated way to ensure that the master does
  292. not remove WAL segments until they have been received by all standbys,
  293. and that the master does not remove rows which could cause a
  294. <a class="link" href="hot-standby.html#HOT-STANDBY-CONFLICT" title="26.5.2. Handling Query Conflicts">recovery conflict</a> even when the
  295. standby is disconnected.
  296. </p><p>
  297. In lieu of using replication slots, it is possible to prevent the removal
  298. of old WAL segments using <a class="xref" href="runtime-config-replication.html#GUC-WAL-KEEP-SEGMENTS">wal_keep_segments</a>, or by
  299. storing the segments in an archive using
  300. <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-COMMAND">archive_command</a>.
  301. However, these methods often result in retaining more WAL segments than
  302. required, whereas replication slots retain only the number of segments
  303. known to be needed. An advantage of these methods is that they bound
  304. the space requirement for <code class="literal">pg_wal</code>; there is currently no way
  305. to do this using replication slots.
  306. </p><p>
  307. Similarly, <a class="xref" href="runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK">hot_standby_feedback</a>
  308. and <a class="xref" href="runtime-config-replication.html#GUC-VACUUM-DEFER-CLEANUP-AGE">vacuum_defer_cleanup_age</a> provide protection against
  309. relevant rows being removed by vacuum, but the former provides no
  310. protection during any time period when the standby is not connected,
  311. and the latter often needs to be set to a high value to provide adequate
  312. protection. Replication slots overcome these disadvantages.
  313. </p><div class="sect3" id="STREAMING-REPLICATION-SLOTS-MANIPULATION"><div class="titlepage"><div><div><h4 class="title">26.2.6.1. Querying and Manipulating Replication Slots</h4></div></div></div><p>
  314. Each replication slot has a name, which can contain lower-case letters,
  315. numbers, and the underscore character.
  316. </p><p>
  317. Existing replication slots and their state can be seen in the
  318. <a class="link" href="view-pg-replication-slots.html" title="51.81. pg_replication_slots"><code class="structname">pg_replication_slots</code></a>
  319. view.
  320. </p><p>
  321. Slots can be created and dropped either via the streaming replication
  322. protocol (see <a class="xref" href="protocol-replication.html" title="52.4. Streaming Replication Protocol">Section 52.4</a>) or via SQL
  323. functions (see <a class="xref" href="functions-admin.html#FUNCTIONS-REPLICATION" title="9.26.6. Replication Functions">Section 9.26.6</a>).
  324. </p></div><div class="sect3" id="STREAMING-REPLICATION-SLOTS-CONFIG"><div class="titlepage"><div><div><h4 class="title">26.2.6.2. Configuration Example</h4></div></div></div><p>
  325. You can create a replication slot like this:
  326. </p><pre class="programlisting">
  327. postgres=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
  328. slot_name | lsn
  329. -------------+-----
  330. node_a_slot |
  331. postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
  332. slot_name | slot_type | active
  333. -------------+-----------+--------
  334. node_a_slot | physical | f
  335. (1 row)
  336. </pre><p>
  337. To configure the standby to use this slot, <code class="varname">primary_slot_name</code>
  338. should be configured on the standby. Here is a simple example:
  339. </p><pre class="programlisting">
  340. primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
  341. primary_slot_name = 'node_a_slot'
  342. </pre><p>
  343. </p></div></div><div class="sect2" id="CASCADING-REPLICATION"><div class="titlepage"><div><div><h3 class="title">26.2.7. Cascading Replication</h3></div></div></div><a id="id-1.6.13.16.19.2" class="indexterm"></a><p>
  344. The cascading replication feature allows a standby server to accept replication
  345. connections and stream WAL records to other standbys, acting as a relay.
  346. This can be used to reduce the number of direct connections to the master
  347. and also to minimize inter-site bandwidth overheads.
  348. </p><p>
  349. A standby acting as both a receiver and a sender is known as a cascading
  350. standby. Standbys that are more directly connected to the master are known
  351. as upstream servers, while those standby servers further away are downstream
  352. servers. Cascading replication does not place limits on the number or
  353. arrangement of downstream servers, though each standby connects to only
  354. one upstream server which eventually links to a single master/primary
  355. server.
  356. </p><p>
  357. A cascading standby sends not only WAL records received from the
  358. master but also those restored from the archive. So even if the replication
  359. connection in some upstream connection is terminated, streaming replication
  360. continues downstream for as long as new WAL records are available.
  361. </p><p>
  362. Cascading replication is currently asynchronous. Synchronous replication
  363. (see <a class="xref" href="warm-standby.html#SYNCHRONOUS-REPLICATION" title="26.2.8. Synchronous Replication">Section 26.2.8</a>) settings have no effect on
  364. cascading replication at present.
  365. </p><p>
  366. Hot Standby feedback propagates upstream, whatever the cascaded arrangement.
  367. </p><p>
  368. If an upstream standby server is promoted to become new master, downstream
  369. servers will continue to stream from the new master if
  370. <code class="varname">recovery_target_timeline</code> is set to <code class="literal">'latest'</code> (the default).
  371. </p><p>
  372. To use cascading replication, set up the cascading standby so that it can
  373. accept replication connections (that is, set
  374. <a class="xref" href="runtime-config-replication.html#GUC-MAX-WAL-SENDERS">max_wal_senders</a> and <a class="xref" href="runtime-config-replication.html#GUC-HOT-STANDBY">hot_standby</a>,
  375. and configure
  376. <a class="link" href="auth-pg-hba-conf.html" title="20.1. The pg_hba.conf File">host-based authentication</a>).
  377. You will also need to set <code class="varname">primary_conninfo</code> in the downstream
  378. standby to point to the cascading standby.
  379. </p></div><div class="sect2" id="SYNCHRONOUS-REPLICATION"><div class="titlepage"><div><div><h3 class="title">26.2.8. Synchronous Replication</h3></div></div></div><a id="id-1.6.13.16.20.2" class="indexterm"></a><p>
  380. <span class="productname">PostgreSQL</span> streaming replication is asynchronous by
  381. default. If the primary server
  382. crashes then some transactions that were committed may not have been
  383. replicated to the standby server, causing data loss. The amount
  384. of data loss is proportional to the replication delay at the time of
  385. failover.
  386. </p><p>
  387. Synchronous replication offers the ability to confirm that all changes
  388. made by a transaction have been transferred to one or more synchronous
  389. standby servers. This extends that standard level of durability
  390. offered by a transaction commit. This level of protection is referred
  391. to as 2-safe replication in computer science theory, and group-1-safe
  392. (group-safe and 1-safe) when <code class="varname">synchronous_commit</code> is set to
  393. <code class="literal">remote_write</code>.
  394. </p><p>
  395. When requesting synchronous replication, each commit of a
  396. write transaction will wait until confirmation is
  397. received that the commit has been written to the write-ahead log on disk
  398. of both the primary and standby server. The only possibility that data
  399. can be lost is if both the primary and the standby suffer crashes at the
  400. same time. This can provide a much higher level of durability, though only
  401. if the sysadmin is cautious about the placement and management of the two
  402. servers. Waiting for confirmation increases the user's confidence that the
  403. changes will not be lost in the event of server crashes but it also
  404. necessarily increases the response time for the requesting transaction.
  405. The minimum wait time is the round-trip time between primary to standby.
  406. </p><p>
  407. Read only transactions and transaction rollbacks need not wait for
  408. replies from standby servers. Subtransaction commits do not wait for
  409. responses from standby servers, only top-level commits. Long
  410. running actions such as data loading or index building do not wait
  411. until the very final commit message. All two-phase commit actions
  412. require commit waits, including both prepare and commit.
  413. </p><p>
  414. A synchronous standby can be a physical replication standby or a logical
  415. replication subscriber. It can also be any other physical or logical WAL
  416. replication stream consumer that knows how to send the appropriate
  417. feedback messages. Besides the built-in physical and logical replication
  418. systems, this includes special programs such
  419. as <code class="command">pg_receivewal</code> and <code class="command">pg_recvlogical</code>
  420. as well as some third-party replication systems and custom programs.
  421. Check the respective documentation for details on synchronous replication
  422. support.
  423. </p><div class="sect3" id="SYNCHRONOUS-REPLICATION-CONFIG"><div class="titlepage"><div><div><h4 class="title">26.2.8.1. Basic Configuration</h4></div></div></div><p>
  424. Once streaming replication has been configured, configuring synchronous
  425. replication requires only one additional configuration step:
  426. <a class="xref" href="runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES">synchronous_standby_names</a> must be set to
  427. a non-empty value. <code class="varname">synchronous_commit</code> must also be set to
  428. <code class="literal">on</code>, but since this is the default value, typically no change is
  429. required. (See <a class="xref" href="runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS" title="19.5.1. Settings">Section 19.5.1</a> and
  430. <a class="xref" href="runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-MASTER" title="19.6.2. Master Server">Section 19.6.2</a>.)
  431. This configuration will cause each commit to wait for
  432. confirmation that the standby has written the commit record to durable
  433. storage.
  434. <code class="varname">synchronous_commit</code> can be set by individual
  435. users, so it can be configured in the configuration file, for particular
  436. users or databases, or dynamically by applications, in order to control
  437. the durability guarantee on a per-transaction basis.
  438. </p><p>
  439. After a commit record has been written to disk on the primary, the
  440. WAL record is then sent to the standby. The standby sends reply
  441. messages each time a new batch of WAL data is written to disk, unless
  442. <code class="varname">wal_receiver_status_interval</code> is set to zero on the standby.
  443. In the case that <code class="varname">synchronous_commit</code> is set to
  444. <code class="literal">remote_apply</code>, the standby sends reply messages when the commit
  445. record is replayed, making the transaction visible.
  446. If the standby is chosen as a synchronous standby, according to the setting
  447. of <code class="varname">synchronous_standby_names</code> on the primary, the reply
  448. messages from that standby will be considered along with those from other
  449. synchronous standbys to decide when to release transactions waiting for
  450. confirmation that the commit record has been received. These parameters
  451. allow the administrator to specify which standby servers should be
  452. synchronous standbys. Note that the configuration of synchronous
  453. replication is mainly on the master. Named standbys must be directly
  454. connected to the master; the master knows nothing about downstream
  455. standby servers using cascaded replication.
  456. </p><p>
  457. Setting <code class="varname">synchronous_commit</code> to <code class="literal">remote_write</code> will
  458. cause each commit to wait for confirmation that the standby has received
  459. the commit record and written it out to its own operating system, but not
  460. for the data to be flushed to disk on the standby. This
  461. setting provides a weaker guarantee of durability than <code class="literal">on</code>
  462. does: the standby could lose the data in the event of an operating system
  463. crash, though not a <span class="productname">PostgreSQL</span> crash.
  464. However, it's a useful setting in practice
  465. because it can decrease the response time for the transaction.
  466. Data loss could only occur if both the primary and the standby crash and
  467. the database of the primary gets corrupted at the same time.
  468. </p><p>
  469. Setting <code class="varname">synchronous_commit</code> to <code class="literal">remote_apply</code> will
  470. cause each commit to wait until the current synchronous standbys report
  471. that they have replayed the transaction, making it visible to user
  472. queries. In simple cases, this allows for load balancing with causal
  473. consistency.
  474. </p><p>
  475. Users will stop waiting if a fast shutdown is requested. However, as
  476. when using asynchronous replication, the server will not fully
  477. shutdown until all outstanding WAL records are transferred to the currently
  478. connected standby servers.
  479. </p></div><div class="sect3" id="SYNCHRONOUS-REPLICATION-MULTIPLE-STANDBYS"><div class="titlepage"><div><div><h4 class="title">26.2.8.2. Multiple Synchronous Standbys</h4></div></div></div><p>
  480. Synchronous replication supports one or more synchronous standby servers;
  481. transactions will wait until all the standby servers which are considered
  482. as synchronous confirm receipt of their data. The number of synchronous
  483. standbys that transactions must wait for replies from is specified in
  484. <code class="varname">synchronous_standby_names</code>. This parameter also specifies
  485. a list of standby names and the method (<code class="literal">FIRST</code> and
  486. <code class="literal">ANY</code>) to choose synchronous standbys from the listed ones.
  487. </p><p>
  488. The method <code class="literal">FIRST</code> specifies a priority-based synchronous
  489. replication and makes transaction commits wait until their WAL records are
  490. replicated to the requested number of synchronous standbys chosen based on
  491. their priorities. The standbys whose names appear earlier in the list are
  492. given higher priority and will be considered as synchronous. Other standby
  493. servers appearing later in this list represent potential synchronous
  494. standbys. If any of the current synchronous standbys disconnects for
  495. whatever reason, it will be replaced immediately with the
  496. next-highest-priority standby.
  497. </p><p>
  498. An example of <code class="varname">synchronous_standby_names</code> for
  499. a priority-based multiple synchronous standbys is:
  500. </p><pre class="programlisting">
  501. synchronous_standby_names = 'FIRST 2 (s1, s2, s3)'
  502. </pre><p>
  503. In this example, if four standby servers <code class="literal">s1</code>, <code class="literal">s2</code>,
  504. <code class="literal">s3</code> and <code class="literal">s4</code> are running, the two standbys
  505. <code class="literal">s1</code> and <code class="literal">s2</code> will be chosen as synchronous standbys
  506. because their names appear early in the list of standby names.
  507. <code class="literal">s3</code> is a potential synchronous standby and will take over
  508. the role of synchronous standby when either of <code class="literal">s1</code> or
  509. <code class="literal">s2</code> fails. <code class="literal">s4</code> is an asynchronous standby since
  510. its name is not in the list.
  511. </p><p>
  512. The method <code class="literal">ANY</code> specifies a quorum-based synchronous
  513. replication and makes transaction commits wait until their WAL records
  514. are replicated to <span class="emphasis"><em>at least</em></span> the requested number of
  515. synchronous standbys in the list.
  516. </p><p>
  517. An example of <code class="varname">synchronous_standby_names</code> for
  518. a quorum-based multiple synchronous standbys is:
  519. </p><pre class="programlisting">
  520. synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
  521. </pre><p>
  522. In this example, if four standby servers <code class="literal">s1</code>, <code class="literal">s2</code>,
  523. <code class="literal">s3</code> and <code class="literal">s4</code> are running, transaction commits will
  524. wait for replies from at least any two standbys of <code class="literal">s1</code>,
  525. <code class="literal">s2</code> and <code class="literal">s3</code>. <code class="literal">s4</code> is an asynchronous
  526. standby since its name is not in the list.
  527. </p><p>
  528. The synchronous states of standby servers can be viewed using
  529. the <code class="structname">pg_stat_replication</code> view.
  530. </p></div><div class="sect3" id="SYNCHRONOUS-REPLICATION-PERFORMANCE"><div class="titlepage"><div><div><h4 class="title">26.2.8.3. Planning for Performance</h4></div></div></div><p>
  531. Synchronous replication usually requires carefully planned and placed
  532. standby servers to ensure applications perform acceptably. Waiting
  533. doesn't utilize system resources, but transaction locks continue to be
  534. held until the transfer is confirmed. As a result, incautious use of
  535. synchronous replication will reduce performance for database
  536. applications because of increased response times and higher contention.
  537. </p><p>
  538. <span class="productname">PostgreSQL</span> allows the application developer
  539. to specify the durability level required via replication. This can be
  540. specified for the system overall, though it can also be specified for
  541. specific users or connections, or even individual transactions.
  542. </p><p>
  543. For example, an application workload might consist of:
  544. 10% of changes are important customer details, while
  545. 90% of changes are less important data that the business can more
  546. easily survive if it is lost, such as chat messages between users.
  547. </p><p>
  548. With synchronous replication options specified at the application level
  549. (on the primary) we can offer synchronous replication for the most
  550. important changes, without slowing down the bulk of the total workload.
  551. Application level options are an important and practical tool for allowing
  552. the benefits of synchronous replication for high performance applications.
  553. </p><p>
  554. You should consider that the network bandwidth must be higher than
  555. the rate of generation of WAL data.
  556. </p></div><div class="sect3" id="SYNCHRONOUS-REPLICATION-HA"><div class="titlepage"><div><div><h4 class="title">26.2.8.4. Planning for High Availability</h4></div></div></div><p>
  557. <code class="varname">synchronous_standby_names</code> specifies the number and
  558. names of synchronous standbys that transaction commits made when
  559. <code class="varname">synchronous_commit</code> is set to <code class="literal">on</code>,
  560. <code class="literal">remote_apply</code> or <code class="literal">remote_write</code> will wait for
  561. responses from. Such transaction commits may never be completed
  562. if any one of synchronous standbys should crash.
  563. </p><p>
  564. The best solution for high availability is to ensure you keep as many
  565. synchronous standbys as requested. This can be achieved by naming multiple
  566. potential synchronous standbys using <code class="varname">synchronous_standby_names</code>.
  567. </p><p>
  568. In a priority-based synchronous replication, the standbys whose names
  569. appear earlier in the list will be used as synchronous standbys.
  570. Standbys listed after these will take over the role of synchronous standby
  571. if one of current ones should fail.
  572. </p><p>
  573. In a quorum-based synchronous replication, all the standbys appearing
  574. in the list will be used as candidates for synchronous standbys.
  575. Even if one of them should fail, the other standbys will keep performing
  576. the role of candidates of synchronous standby.
  577. </p><p>
  578. When a standby first attaches to the primary, it will not yet be properly
  579. synchronized. This is described as <code class="literal">catchup</code> mode. Once
  580. the lag between standby and primary reaches zero for the first time
  581. we move to real-time <code class="literal">streaming</code> state.
  582. The catch-up duration may be long immediately after the standby has
  583. been created. If the standby is shut down, then the catch-up period
  584. will increase according to the length of time the standby has been down.
  585. The standby is only able to become a synchronous standby
  586. once it has reached <code class="literal">streaming</code> state.
  587. This state can be viewed using
  588. the <code class="structname">pg_stat_replication</code> view.
  589. </p><p>
  590. If primary restarts while commits are waiting for acknowledgment, those
  591. waiting transactions will be marked fully committed once the primary
  592. database recovers.
  593. There is no way to be certain that all standbys have received all
  594. outstanding WAL data at time of the crash of the primary. Some
  595. transactions may not show as committed on the standby, even though
  596. they show as committed on the primary. The guarantee we offer is that
  597. the application will not receive explicit acknowledgment of the
  598. successful commit of a transaction until the WAL data is known to be
  599. safely received by all the synchronous standbys.
  600. </p><p>
  601. If you really cannot keep as many synchronous standbys as requested
  602. then you should decrease the number of synchronous standbys that
  603. transaction commits must wait for responses from
  604. in <code class="varname">synchronous_standby_names</code> (or disable it) and
  605. reload the configuration file on the primary server.
  606. </p><p>
  607. If the primary is isolated from remaining standby servers you should
  608. fail over to the best candidate of those other remaining standby servers.
  609. </p><p>
  610. If you need to re-create a standby server while transactions are
  611. waiting, make sure that the commands pg_start_backup() and
  612. pg_stop_backup() are run in a session with
  613. <code class="varname">synchronous_commit</code> = <code class="literal">off</code>, otherwise those
  614. requests will wait forever for the standby to appear.
  615. </p></div></div><div class="sect2" id="CONTINUOUS-ARCHIVING-IN-STANDBY"><div class="titlepage"><div><div><h3 class="title">26.2.9. Continuous Archiving in Standby</h3></div></div></div><a id="id-1.6.13.16.21.2" class="indexterm"></a><p>
  616. When continuous WAL archiving is used in a standby, there are two
  617. different scenarios: the WAL archive can be shared between the primary
  618. and the standby, or the standby can have its own WAL archive. When
  619. the standby has its own WAL archive, set <code class="varname">archive_mode</code>
  620. to <code class="literal">always</code>, and the standby will call the archive
  621. command for every WAL segment it receives, whether it's by restoring
  622. from the archive or by streaming replication. The shared archive can
  623. be handled similarly, but the <code class="varname">archive_command</code> must
  624. test if the file being archived exists already, and if the existing file
  625. has identical contents. This requires more care in the
  626. <code class="varname">archive_command</code>, as it must
  627. be careful to not overwrite an existing file with different contents,
  628. but return success if the exactly same file is archived twice. And
  629. all that must be done free of race conditions, if two servers attempt
  630. to archive the same file at the same time.
  631. </p><p>
  632. If <code class="varname">archive_mode</code> is set to <code class="literal">on</code>, the
  633. archiver is not enabled during recovery or standby mode. If the standby
  634. server is promoted, it will start archiving after the promotion, but
  635. will not archive any WAL it did not generate itself. To get a complete
  636. series of WAL files in the archive, you must ensure that all WAL is
  637. archived, before it reaches the standby. This is inherently true with
  638. file-based log shipping, as the standby can only restore files that
  639. are found in the archive, but not if streaming replication is enabled.
  640. When a server is not in recovery mode, there is no difference between
  641. <code class="literal">on</code> and <code class="literal">always</code> modes.
  642. </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="different-replication-solutions.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="high-availability.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="warm-standby-failover.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">26.1. Comparison of Different Solutions </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 26.3. Failover</td></tr></table></div></body></html>
上海开阖软件有限公司 沪ICP备12045867号-1