|
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>14.4. Populating a Database</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="explicit-joins.html" title="14.3. Controlling the Planner with Explicit JOIN Clauses" /><link rel="next" href="non-durability.html" title="14.5. Non-Durable Settings" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">14.4. Populating a Database</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="explicit-joins.html" title="14.3. Controlling the Planner with Explicit JOIN Clauses">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><th width="60%" align="center">Chapter 14. Performance Tips</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="non-durability.html" title="14.5. Non-Durable Settings">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="POPULATE"><div class="titlepage"><div><div><h2 class="title" style="clear: both">14.4. Populating a Database</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="populate.html#DISABLE-AUTOCOMMIT">14.4.1. Disable Autocommit</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-COPY-FROM">14.4.2. Use <code class="command">COPY</code></a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-RM-INDEXES">14.4.3. Remove Indexes</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-RM-FKEYS">14.4.4. Remove Foreign Key Constraints</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-WORK-MEM">14.4.5. Increase <code class="varname">maintenance_work_mem</code></a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-MAX-WAL-SIZE">14.4.6. Increase <code class="varname">max_wal_size</code></a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-PITR">14.4.7. Disable WAL Archival and Streaming Replication</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-ANALYZE">14.4.8. Run <code class="command">ANALYZE</code> Afterwards</a></span></dt><dt><span class="sect2"><a href="populate.html#POPULATE-PG-DUMP">14.4.9. Some Notes about <span class="application">pg_dump</span></a></span></dt></dl></div><p>
- One might need to insert a large amount of data when first populating
- a database. This section contains some suggestions on how to make
- this process as efficient as possible.
- </p><div class="sect2" id="DISABLE-AUTOCOMMIT"><div class="titlepage"><div><div><h3 class="title">14.4.1. Disable Autocommit</h3></div></div></div><a id="id-1.5.13.7.3.2" class="indexterm"></a><p>
- When using multiple <code class="command">INSERT</code>s, turn off autocommit and just do
- one commit at the end. (In plain
- SQL, this means issuing <code class="command">BEGIN</code> at the start and
- <code class="command">COMMIT</code> at the end. Some client libraries might
- do this behind your back, in which case you need to make sure the
- library does it when you want it done.) If you allow each
- insertion to be committed separately,
- <span class="productname">PostgreSQL</span> is doing a lot of work for
- each row that is added. An additional benefit of doing all
- insertions in one transaction is that if the insertion of one row
- were to fail then the insertion of all rows inserted up to that
- point would be rolled back, so you won't be stuck with partially
- loaded data.
- </p></div><div class="sect2" id="POPULATE-COPY-FROM"><div class="titlepage"><div><div><h3 class="title">14.4.2. Use <code class="command">COPY</code></h3></div></div></div><p>
- Use <a class="xref" href="sql-copy.html" title="COPY"><span class="refentrytitle">COPY</span></a> to load
- all the rows in one command, instead of using a series of
- <code class="command">INSERT</code> commands. The <code class="command">COPY</code>
- command is optimized for loading large numbers of rows; it is less
- flexible than <code class="command">INSERT</code>, but incurs significantly
- less overhead for large data loads. Since <code class="command">COPY</code>
- is a single command, there is no need to disable autocommit if you
- use this method to populate a table.
- </p><p>
- If you cannot use <code class="command">COPY</code>, it might help to use <a class="xref" href="sql-prepare.html" title="PREPARE"><span class="refentrytitle">PREPARE</span></a> to create a
- prepared <code class="command">INSERT</code> statement, and then use
- <code class="command">EXECUTE</code> as many times as required. This avoids
- some of the overhead of repeatedly parsing and planning
- <code class="command">INSERT</code>. Different interfaces provide this facility
- in different ways; look for <span class="quote">“<span class="quote">prepared statements</span>”</span> in the interface
- documentation.
- </p><p>
- Note that loading a large number of rows using
- <code class="command">COPY</code> is almost always faster than using
- <code class="command">INSERT</code>, even if <code class="command">PREPARE</code> is used and
- multiple insertions are batched into a single transaction.
- </p><p>
- <code class="command">COPY</code> is fastest when used within the same
- transaction as an earlier <code class="command">CREATE TABLE</code> or
- <code class="command">TRUNCATE</code> command. In such cases no WAL
- needs to be written, because in case of an error, the files
- containing the newly loaded data will be removed anyway.
- However, this consideration only applies when
- <a class="xref" href="runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</a> is <code class="literal">minimal</code> for
- non-partitioned tables as all commands must write WAL otherwise.
- </p></div><div class="sect2" id="POPULATE-RM-INDEXES"><div class="titlepage"><div><div><h3 class="title">14.4.3. Remove Indexes</h3></div></div></div><p>
- If you are loading a freshly created table, the fastest method is to
- create the table, bulk load the table's data using
- <code class="command">COPY</code>, then create any indexes needed for the
- table. Creating an index on pre-existing data is quicker than
- updating it incrementally as each row is loaded.
- </p><p>
- If you are adding large amounts of data to an existing table,
- it might be a win to drop the indexes,
- load the table, and then recreate the indexes. Of course, the
- database performance for other users might suffer
- during the time the indexes are missing. One should also think
- twice before dropping a unique index, since the error checking
- afforded by the unique constraint will be lost while the index is
- missing.
- </p></div><div class="sect2" id="POPULATE-RM-FKEYS"><div class="titlepage"><div><div><h3 class="title">14.4.4. Remove Foreign Key Constraints</h3></div></div></div><p>
- Just as with indexes, a foreign key constraint can be checked
- <span class="quote">“<span class="quote">in bulk</span>”</span> more efficiently than row-by-row. So it might be
- useful to drop foreign key constraints, load data, and re-create
- the constraints. Again, there is a trade-off between data load
- speed and loss of error checking while the constraint is missing.
- </p><p>
- What's more, when you load data into a table with existing foreign key
- constraints, each new row requires an entry in the server's list of
- pending trigger events (since it is the firing of a trigger that checks
- the row's foreign key constraint). Loading many millions of rows can
- cause the trigger event queue to overflow available memory, leading to
- intolerable swapping or even outright failure of the command. Therefore
- it may be <span class="emphasis"><em>necessary</em></span>, not just desirable, to drop and re-apply
- foreign keys when loading large amounts of data. If temporarily removing
- the constraint isn't acceptable, the only other recourse may be to split
- up the load operation into smaller transactions.
- </p></div><div class="sect2" id="POPULATE-WORK-MEM"><div class="titlepage"><div><div><h3 class="title">14.4.5. Increase <code class="varname">maintenance_work_mem</code></h3></div></div></div><p>
- Temporarily increasing the <a class="xref" href="runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM">maintenance_work_mem</a>
- configuration variable when loading large amounts of data can
- lead to improved performance. This will help to speed up <code class="command">CREATE
- INDEX</code> commands and <code class="command">ALTER TABLE ADD FOREIGN KEY</code> commands.
- It won't do much for <code class="command">COPY</code> itself, so this advice is
- only useful when you are using one or both of the above techniques.
- </p></div><div class="sect2" id="POPULATE-MAX-WAL-SIZE"><div class="titlepage"><div><div><h3 class="title">14.4.6. Increase <code class="varname">max_wal_size</code></h3></div></div></div><p>
- Temporarily increasing the <a class="xref" href="runtime-config-wal.html#GUC-MAX-WAL-SIZE">max_wal_size</a>
- configuration variable can also
- make large data loads faster. This is because loading a large
- amount of data into <span class="productname">PostgreSQL</span> will
- cause checkpoints to occur more often than the normal checkpoint
- frequency (specified by the <code class="varname">checkpoint_timeout</code>
- configuration variable). Whenever a checkpoint occurs, all dirty
- pages must be flushed to disk. By increasing
- <code class="varname">max_wal_size</code> temporarily during bulk
- data loads, the number of checkpoints that are required can be
- reduced.
- </p></div><div class="sect2" id="POPULATE-PITR"><div class="titlepage"><div><div><h3 class="title">14.4.7. Disable WAL Archival and Streaming Replication</h3></div></div></div><p>
- When loading large amounts of data into an installation that uses
- WAL archiving or streaming replication, it might be faster to take a
- new base backup after the load has completed than to process a large
- amount of incremental WAL data. To prevent incremental WAL logging
- while loading, disable archiving and streaming replication, by setting
- <a class="xref" href="runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</a> to <code class="literal">minimal</code>,
- <a class="xref" href="runtime-config-wal.html#GUC-ARCHIVE-MODE">archive_mode</a> to <code class="literal">off</code>, and
- <a class="xref" href="runtime-config-replication.html#GUC-MAX-WAL-SENDERS">max_wal_senders</a> to zero.
- But note that changing these settings requires a server restart.
- </p><p>
- Aside from avoiding the time for the archiver or WAL sender to
- process the WAL data,
- doing this will actually make certain commands faster, because they
- are designed not to write WAL at all if <code class="varname">wal_level</code>
- is <code class="literal">minimal</code>. (They can guarantee crash safety more cheaply
- by doing an <code class="function">fsync</code> at the end than by writing WAL.)
- This applies to the following commands:
- </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
- <code class="command">CREATE TABLE AS SELECT</code>
- </p></li><li class="listitem"><p>
- <code class="command">CREATE INDEX</code> (and variants such as
- <code class="command">ALTER TABLE ADD PRIMARY KEY</code>)
- </p></li><li class="listitem"><p>
- <code class="command">ALTER TABLE SET TABLESPACE</code>
- </p></li><li class="listitem"><p>
- <code class="command">CLUSTER</code>
- </p></li><li class="listitem"><p>
- <code class="command">COPY FROM</code>, when the target table has been
- created or truncated earlier in the same transaction
- </p></li></ul></div><p>
- </p></div><div class="sect2" id="POPULATE-ANALYZE"><div class="titlepage"><div><div><h3 class="title">14.4.8. Run <code class="command">ANALYZE</code> Afterwards</h3></div></div></div><p>
- Whenever you have significantly altered the distribution of data
- within a table, running <a class="xref" href="sql-analyze.html" title="ANALYZE"><span class="refentrytitle">ANALYZE</span></a> is strongly recommended. This
- includes bulk loading large amounts of data into the table. Running
- <code class="command">ANALYZE</code> (or <code class="command">VACUUM ANALYZE</code>)
- ensures that the planner has up-to-date statistics about the
- table. With no statistics or obsolete statistics, the planner might
- make poor decisions during query planning, leading to poor
- performance on any tables with inaccurate or nonexistent
- statistics. Note that if the autovacuum daemon is enabled, it might
- run <code class="command">ANALYZE</code> automatically; see
- <a class="xref" href="routine-vacuuming.html#VACUUM-FOR-STATISTICS" title="24.1.3. Updating Planner Statistics">Section 24.1.3</a>
- and <a class="xref" href="routine-vacuuming.html#AUTOVACUUM" title="24.1.6. The Autovacuum Daemon">Section 24.1.6</a> for more information.
- </p></div><div class="sect2" id="POPULATE-PG-DUMP"><div class="titlepage"><div><div><h3 class="title">14.4.9. Some Notes about <span class="application">pg_dump</span></h3></div></div></div><p>
- Dump scripts generated by <span class="application">pg_dump</span> automatically apply
- several, but not all, of the above guidelines. To reload a
- <span class="application">pg_dump</span> dump as quickly as possible, you need to
- do a few extra things manually. (Note that these points apply while
- <span class="emphasis"><em>restoring</em></span> a dump, not while <span class="emphasis"><em>creating</em></span> it.
- The same points apply whether loading a text dump with
- <span class="application">psql</span> or using <span class="application">pg_restore</span> to load
- from a <span class="application">pg_dump</span> archive file.)
- </p><p>
- By default, <span class="application">pg_dump</span> uses <code class="command">COPY</code>, and when
- it is generating a complete schema-and-data dump, it is careful to
- load data before creating indexes and foreign keys. So in this case
- several guidelines are handled automatically. What is left
- for you to do is to:
- </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
- Set appropriate (i.e., larger than normal) values for
- <code class="varname">maintenance_work_mem</code> and
- <code class="varname">max_wal_size</code>.
- </p></li><li class="listitem"><p>
- If using WAL archiving or streaming replication, consider disabling
- them during the restore. To do that, set <code class="varname">archive_mode</code>
- to <code class="literal">off</code>,
- <code class="varname">wal_level</code> to <code class="literal">minimal</code>, and
- <code class="varname">max_wal_senders</code> to zero before loading the dump.
- Afterwards, set them back to the right values and take a fresh
- base backup.
- </p></li><li class="listitem"><p>
- Experiment with the parallel dump and restore modes of both
- <span class="application">pg_dump</span> and <span class="application">pg_restore</span> and find the
- optimal number of concurrent jobs to use. Dumping and restoring in
- parallel by means of the <code class="option">-j</code> option should give you a
- significantly higher performance over the serial mode.
- </p></li><li class="listitem"><p>
- Consider whether the whole dump should be restored as a single
- transaction. To do that, pass the <code class="option">-1</code> or
- <code class="option">--single-transaction</code> command-line option to
- <span class="application">psql</span> or <span class="application">pg_restore</span>. When using this
- mode, even the smallest of errors will rollback the entire restore,
- possibly discarding many hours of processing. Depending on how
- interrelated the data is, that might seem preferable to manual cleanup,
- or not. <code class="command">COPY</code> commands will run fastest if you use a single
- transaction and have WAL archiving turned off.
- </p></li><li class="listitem"><p>
- If multiple CPUs are available in the database server, consider using
- <span class="application">pg_restore</span>'s <code class="option">--jobs</code> option. This
- allows concurrent data loading and index creation.
- </p></li><li class="listitem"><p>
- Run <code class="command">ANALYZE</code> afterwards.
- </p></li></ul></div><p>
- </p><p>
- A data-only dump will still use <code class="command">COPY</code>, but it does not
- drop or recreate indexes, and it does not normally touch foreign
- keys.
-
- <a href="#ftn.id-1.5.13.7.11.4.2" class="footnote"><sup class="footnote" id="id-1.5.13.7.11.4.2">[13]</sup></a>
-
- So when loading a data-only dump, it is up to you to drop and recreate
- indexes and foreign keys if you wish to use those techniques.
- It's still useful to increase <code class="varname">max_wal_size</code>
- while loading the data, but don't bother increasing
- <code class="varname">maintenance_work_mem</code>; rather, you'd do that while
- manually recreating indexes and foreign keys afterwards.
- And don't forget to <code class="command">ANALYZE</code> when you're done; see
- <a class="xref" href="routine-vacuuming.html#VACUUM-FOR-STATISTICS" title="24.1.3. Updating Planner Statistics">Section 24.1.3</a>
- and <a class="xref" href="routine-vacuuming.html#AUTOVACUUM" title="24.1.6. The Autovacuum Daemon">Section 24.1.6</a> for more information.
- </p></div><div class="footnotes"><br /><hr style="width:100; text-align:left;margin-left: 0" /><div id="ftn.id-1.5.13.7.11.4.2" class="footnote"><p><a href="#id-1.5.13.7.11.4.2" class="para"><sup class="para">[13] </sup></a>
- You can get the effect of disabling foreign keys by using
- the <code class="option">--disable-triggers</code> option — but realize that
- that eliminates, rather than just postpones, foreign key
- validation, and so it is possible to insert bad data if you use it.
- </p></div></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="explicit-joins.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="performance-tips.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="non-durability.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">14.3. Controlling the Planner with Explicit <code class="literal">JOIN</code> Clauses </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 14.5. Non-Durable Settings</td></tr></table></div></body></html>
|