gooderp18绿色标准版
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

209 lines
15KB

  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>11.9. Index-Only Scans and Covering Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="indexes-partial.html" title="11.8. Partial Indexes" /><link rel="next" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">11.9. Index-Only Scans and Covering Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="indexes-partial.html" title="11.8. Partial Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><th width="60%" align="center">Chapter 11. Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="INDEXES-INDEX-ONLY-SCANS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">11.9. Index-Only Scans and Covering Indexes</h2></div></div></div><a id="id-1.5.10.12.2" class="indexterm"></a><a id="id-1.5.10.12.3" class="indexterm"></a><a id="id-1.5.10.12.4" class="indexterm"></a><a id="id-1.5.10.12.5" class="indexterm"></a><p>
  3. All indexes in <span class="productname">PostgreSQL</span>
  4. are <em class="firstterm">secondary</em> indexes, meaning that each index is
  5. stored separately from the table's main data area (which is called the
  6. table's <em class="firstterm">heap</em>
  7. in <span class="productname">PostgreSQL</span> terminology). This means that
  8. in an ordinary index scan, each row retrieval requires fetching data from
  9. both the index and the heap. Furthermore, while the index entries that
  10. match a given indexable <code class="literal">WHERE</code> condition are usually
  11. close together in the index, the table rows they reference might be
  12. anywhere in the heap. The heap-access portion of an index scan thus
  13. involves a lot of random access into the heap, which can be slow,
  14. particularly on traditional rotating media. (As described in
  15. <a class="xref" href="indexes-bitmap-scans.html" title="11.5. Combining Multiple Indexes">Section 11.5</a>, bitmap scans try to alleviate
  16. this cost by doing the heap accesses in sorted order, but that only goes
  17. so far.)
  18. </p><p>
  19. To solve this performance problem, <span class="productname">PostgreSQL</span>
  20. supports <em class="firstterm">index-only scans</em>, which can answer
  21. queries from an index alone without any heap access. The basic idea is
  22. to return values directly out of each index entry instead of consulting
  23. the associated heap entry. There are two fundamental restrictions on
  24. when this method can be used:
  25. </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
  26. The index type must support index-only scans. B-tree indexes always
  27. do. GiST and SP-GiST indexes support index-only scans for some
  28. operator classes but not others. Other index types have no support.
  29. The underlying requirement is that the index must physically store, or
  30. else be able to reconstruct, the original data value for each index
  31. entry. As a counterexample, GIN indexes cannot support index-only
  32. scans because each index entry typically holds only part of the
  33. original data value.
  34. </p></li><li class="listitem"><p>
  35. The query must reference only columns stored in the index. For
  36. example, given an index on columns <code class="literal">x</code>
  37. and <code class="literal">y</code> of a table that also has a
  38. column <code class="literal">z</code>, these queries could use index-only scans:
  39. </p><pre class="programlisting">
  40. SELECT x, y FROM tab WHERE x = 'key';
  41. SELECT x FROM tab WHERE x = 'key' AND y &lt; 42;
  42. </pre><p>
  43. but these queries could not:
  44. </p><pre class="programlisting">
  45. SELECT x, z FROM tab WHERE x = 'key';
  46. SELECT x FROM tab WHERE x = 'key' AND z &lt; 42;
  47. </pre><p>
  48. (Expression indexes and partial indexes complicate this rule,
  49. as discussed below.)
  50. </p></li></ol></div><p>
  51. </p><p>
  52. If these two fundamental requirements are met, then all the data values
  53. required by the query are available from the index, so an index-only scan
  54. is physically possible. But there is an additional requirement for any
  55. table scan in <span class="productname">PostgreSQL</span>: it must verify that
  56. each retrieved row be <span class="quote">“<span class="quote">visible</span>”</span> to the query's MVCC
  57. snapshot, as discussed in <a class="xref" href="mvcc.html" title="Chapter 13. Concurrency Control">Chapter 13</a>. Visibility information
  58. is not stored in index entries, only in heap entries; so at first glance
  59. it would seem that every row retrieval would require a heap access
  60. anyway. And this is indeed the case, if the table row has been modified
  61. recently. However, for seldom-changing data there is a way around this
  62. problem. <span class="productname">PostgreSQL</span> tracks, for each page in
  63. a table's heap, whether all rows stored in that page are old enough to be
  64. visible to all current and future transactions. This information is
  65. stored in a bit in the table's <em class="firstterm">visibility map</em>. An
  66. index-only scan, after finding a candidate index entry, checks the
  67. visibility map bit for the corresponding heap page. If it's set, the row
  68. is known visible and so the data can be returned with no further work.
  69. If it's not set, the heap entry must be visited to find out whether it's
  70. visible, so no performance advantage is gained over a standard index
  71. scan. Even in the successful case, this approach trades visibility map
  72. accesses for heap accesses; but since the visibility map is four orders
  73. of magnitude smaller than the heap it describes, far less physical I/O is
  74. needed to access it. In most situations the visibility map remains
  75. cached in memory all the time.
  76. </p><p>
  77. In short, while an index-only scan is possible given the two fundamental
  78. requirements, it will be a win only if a significant fraction of the
  79. table's heap pages have their all-visible map bits set. But tables in
  80. which a large fraction of the rows are unchanging are common enough to
  81. make this type of scan very useful in practice.
  82. </p><p>
  83. <a id="id-1.5.10.12.10.1" class="indexterm"></a>
  84. To make effective use of the index-only scan feature, you might choose to
  85. create a <em class="firstterm">covering index</em>, which is an index
  86. specifically designed to include the columns needed by a particular
  87. type of query that you run frequently. Since queries typically need to
  88. retrieve more columns than just the ones they search
  89. on, <span class="productname">PostgreSQL</span> allows you to create an index
  90. in which some columns are just <span class="quote">“<span class="quote">payload</span>”</span> and are not part
  91. of the search key. This is done by adding an <code class="literal">INCLUDE</code>
  92. clause listing the extra columns. For example, if you commonly run
  93. queries like
  94. </p><pre class="programlisting">
  95. SELECT y FROM tab WHERE x = 'key';
  96. </pre><p>
  97. the traditional approach to speeding up such queries would be to create
  98. an index on <code class="literal">x</code> only. However, an index defined as
  99. </p><pre class="programlisting">
  100. CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);
  101. </pre><p>
  102. could handle these queries as index-only scans,
  103. because <code class="literal">y</code> can be obtained from the index without
  104. visiting the heap.
  105. </p><p>
  106. Because column <code class="literal">y</code> is not part of the index's search
  107. key, it does not have to be of a data type that the index can handle;
  108. it's merely stored in the index and is not interpreted by the index
  109. machinery. Also, if the index is a unique index, that is
  110. </p><pre class="programlisting">
  111. CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
  112. </pre><p>
  113. the uniqueness condition applies to just column <code class="literal">x</code>,
  114. not to the combination of <code class="literal">x</code> and <code class="literal">y</code>.
  115. (An <code class="literal">INCLUDE</code> clause can also be written
  116. in <code class="literal">UNIQUE</code> and <code class="literal">PRIMARY KEY</code>
  117. constraints, providing alternative syntax for setting up an index like
  118. this.)
  119. </p><p>
  120. It's wise to be conservative about adding non-key payload columns to an
  121. index, especially wide columns. If an index tuple exceeds the
  122. maximum size allowed for the index type, data insertion will fail.
  123. In any case, non-key columns duplicate data from the index's table
  124. and bloat the size of the index, thus potentially slowing searches.
  125. And remember that there is little point in including payload columns in an
  126. index unless the table changes slowly enough that an index-only scan is
  127. likely to not need to access the heap. If the heap tuple must be visited
  128. anyway, it costs nothing more to get the column's value from there.
  129. Other restrictions are that expressions are not currently supported as
  130. included columns, and that only B-tree and GiST indexes currently support
  131. included columns.
  132. </p><p>
  133. Before <span class="productname">PostgreSQL</span> had
  134. the <code class="literal">INCLUDE</code> feature, people sometimes made covering
  135. indexes by writing the payload columns as ordinary index columns,
  136. that is writing
  137. </p><pre class="programlisting">
  138. CREATE INDEX tab_x_y ON tab(x, y);
  139. </pre><p>
  140. even though they had no intention of ever using <code class="literal">y</code> as
  141. part of a <code class="literal">WHERE</code> clause. This works fine as long as
  142. the extra columns are trailing columns; making them be leading columns is
  143. unwise for the reasons explained in <a class="xref" href="indexes-multicolumn.html" title="11.3. Multicolumn Indexes">Section 11.3</a>.
  144. However, this method doesn't support the case where you want the index to
  145. enforce uniqueness on the key column(s).
  146. </p><p>
  147. <em class="firstterm">Suffix truncation</em> always removes non-key
  148. columns from upper B-Tree levels. As payload columns, they are
  149. never used to guide index scans. The truncation process also
  150. removes one or more trailing key column(s) when the remaining
  151. prefix of key column(s) happens to be sufficient to describe tuples
  152. on the lowest B-Tree level. In practice, covering indexes without
  153. an <code class="literal">INCLUDE</code> clause often avoid storing columns
  154. that are effectively payload in the upper levels. However,
  155. explicitly defining payload columns as non-key columns
  156. <span class="emphasis"><em>reliably</em></span> keeps the tuples in upper levels
  157. small.
  158. </p><p>
  159. In principle, index-only scans can be used with expression indexes.
  160. For example, given an index on <code class="literal">f(x)</code>
  161. where <code class="literal">x</code> is a table column, it should be possible to
  162. execute
  163. </p><pre class="programlisting">
  164. SELECT f(x) FROM tab WHERE f(x) &lt; 1;
  165. </pre><p>
  166. as an index-only scan; and this is very attractive
  167. if <code class="literal">f()</code> is an expensive-to-compute function.
  168. However, <span class="productname">PostgreSQL</span>'s planner is currently not
  169. very smart about such cases. It considers a query to be potentially
  170. executable by index-only scan only when all <span class="emphasis"><em>columns</em></span>
  171. needed by the query are available from the index. In this
  172. example, <code class="literal">x</code> is not needed except in the
  173. context <code class="literal">f(x)</code>, but the planner does not notice that and
  174. concludes that an index-only scan is not possible. If an index-only scan
  175. seems sufficiently worthwhile, this can be worked around by
  176. adding <code class="literal">x</code> as an included column, for example
  177. </p><pre class="programlisting">
  178. CREATE INDEX tab_f_x ON tab (f(x)) INCLUDE (x);
  179. </pre><p>
  180. An additional caveat, if the goal is to avoid
  181. recalculating <code class="literal">f(x)</code>, is that the planner won't
  182. necessarily match uses of <code class="literal">f(x)</code> that aren't in
  183. indexable <code class="literal">WHERE</code> clauses to the index column. It will
  184. usually get this right in simple queries such as shown above, but not in
  185. queries that involve joins. These deficiencies may be remedied in future
  186. versions of <span class="productname">PostgreSQL</span>.
  187. </p><p>
  188. Partial indexes also have interesting interactions with index-only scans.
  189. Consider the partial index shown in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX3" title="Example 11.3. Setting up a Partial Unique Index">Example 11.3</a>:
  190. </p><pre class="programlisting">
  191. CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
  192. WHERE success;
  193. </pre><p>
  194. In principle, we could do an index-only scan on this index to satisfy a
  195. query like
  196. </p><pre class="programlisting">
  197. SELECT target FROM tests WHERE subject = 'some-subject' AND success;
  198. </pre><p>
  199. But there's a problem: the <code class="literal">WHERE</code> clause refers
  200. to <code class="literal">success</code> which is not available as a result column
  201. of the index. Nonetheless, an index-only scan is possible because the
  202. plan does not need to recheck that part of the <code class="literal">WHERE</code>
  203. clause at run time: all entries found in the index necessarily
  204. have <code class="literal">success = true</code> so this need not be explicitly
  205. checked in the plan. <span class="productname">PostgreSQL</span> versions 9.6
  206. and later will recognize such cases and allow index-only scans to be
  207. generated, but older versions will not.
  208. </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="indexes-partial.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indexes.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="indexes-opclass.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">11.8. Partial Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 11.10. Operator Classes and Operator Families</td></tr></table></div></body></html>
上海开阖软件有限公司 沪ICP备12045867号-1