gooderp18绿色标准版
Nelze vybrat více než 25 témat Téma musí začínat písmenem nebo číslem, může obsahovat pomlčky („-“) a může být dlouhé až 35 znaků.

183 lines
13KB

  1. <?xml version="1.0" encoding="UTF-8" standalone="no"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.5. bloom</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="auto-explain.html" title="F.4. auto_explain" /><link rel="next" href="btree-gin.html" title="F.6. btree_gin" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">F.5. bloom</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="auto-explain.html" title="F.4. auto_explain">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules">Up</a></td><th width="60%" align="center">Appendix F. Additional Supplied Modules</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.6. btree_gin">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="BLOOM"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.5. bloom</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.7">F.5.1. Parameters</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.8">F.5.2. Examples</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.9">F.5.3. Operator Class Interface</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.10">F.5.4. Limitations</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.11">F.5.5. Authors</a></span></dt></dl></div><a id="id-1.11.7.14.2" class="indexterm"></a><p>
  3. <code class="literal">bloom</code> provides an index access method based on
  4. <a class="ulink" href="https://en.wikipedia.org/wiki/Bloom_filter" target="_top">Bloom filters</a>.
  5. </p><p>
  6. A Bloom filter is a space-efficient data structure that is used to test
  7. whether an element is a member of a set. In the case of an index access
  8. method, it allows fast exclusion of non-matching tuples via signatures
  9. whose size is determined at index creation.
  10. </p><p>
  11. A signature is a lossy representation of the indexed attribute(s), and as
  12. such is prone to reporting false positives; that is, it may be reported
  13. that an element is in the set, when it is not. So index search results
  14. must always be rechecked using the actual attribute values from the heap
  15. entry. Larger signatures reduce the odds of a false positive and thus
  16. reduce the number of useless heap visits, but of course also make the index
  17. larger and hence slower to scan.
  18. </p><p>
  19. This type of index is most useful when a table has many attributes and
  20. queries test arbitrary combinations of them. A traditional btree index is
  21. faster than a bloom index, but it can require many btree indexes to support
  22. all possible queries where one needs only a single bloom index. Note
  23. however that bloom indexes only support equality queries, whereas btree
  24. indexes can also perform inequality and range searches.
  25. </p><div class="sect2" id="id-1.11.7.14.7"><div class="titlepage"><div><div><h3 class="title">F.5.1. Parameters</h3></div></div></div><p>
  26. A <code class="literal">bloom</code> index accepts the following parameters in its
  27. <code class="literal">WITH</code> clause:
  28. </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">length</code></span></dt><dd><p>
  29. Length of each signature (index entry) in bits. It is rounded up to the
  30. nearest multiple of <code class="literal">16</code>. The default is
  31. <code class="literal">80</code> bits and the maximum is <code class="literal">4096</code>.
  32. </p></dd></dl></div><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">col1 — col32</code></span></dt><dd><p>
  33. Number of bits generated for each index column. Each parameter's name
  34. refers to the number of the index column that it controls. The default
  35. is <code class="literal">2</code> bits and the maximum is <code class="literal">4095</code>.
  36. Parameters for index columns not actually used are ignored.
  37. </p></dd></dl></div></div><div class="sect2" id="id-1.11.7.14.8"><div class="titlepage"><div><div><h3 class="title">F.5.2. Examples</h3></div></div></div><p>
  38. This is an example of creating a bloom index:
  39. </p><pre class="programlisting">
  40. CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
  41. WITH (length=80, col1=2, col2=2, col3=4);
  42. </pre><p>
  43. The index is created with a signature length of 80 bits, with attributes
  44. i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could
  45. have omitted the <code class="literal">length</code>, <code class="literal">col1</code>,
  46. and <code class="literal">col2</code> specifications since those have the default values.
  47. </p><p>
  48. Here is a more complete example of bloom index definition and usage, as
  49. well as a comparison with equivalent btree indexes. The bloom index is
  50. considerably smaller than the btree index, and can perform better.
  51. </p><pre class="programlisting">
  52. =# CREATE TABLE tbloom AS
  53. SELECT
  54. (random() * 1000000)::int as i1,
  55. (random() * 1000000)::int as i2,
  56. (random() * 1000000)::int as i3,
  57. (random() * 1000000)::int as i4,
  58. (random() * 1000000)::int as i5,
  59. (random() * 1000000)::int as i6
  60. FROM
  61. generate_series(1,10000000);
  62. SELECT 10000000
  63. =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
  64. CREATE INDEX
  65. =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
  66. pg_size_pretty
  67. ----------------
  68. 153 MB
  69. (1 row)
  70. =# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
  71. CREATE INDEX
  72. =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
  73. pg_size_pretty
  74. ----------------
  75. 387 MB
  76. (1 row)
  77. </pre><p>
  78. A sequential scan over this large table takes a long time:
  79. </p><pre class="programlisting">
  80. =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
  81. QUERY PLAN
  82. ------------------------------------------------------------------------------------------------------------
  83. Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
  84. Filter: ((i2 = 898732) AND (i5 = 123451))
  85. Rows Removed by Filter: 10000000
  86. Planning time: 0.177 ms
  87. Execution time: 1445.473 ms
  88. (5 rows)
  89. </pre><p>
  90. </p><p>
  91. So the planner will usually select an index scan if possible.
  92. With a btree index, we get results like this:
  93. </p><pre class="programlisting">
  94. =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
  95. QUERY PLAN
  96. --------------------------------------------------------------------------------------------------------------------------------
  97. Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
  98. Index Cond: ((i2 = 898732) AND (i5 = 123451))
  99. Heap Fetches: 0
  100. Planning time: 0.193 ms
  101. Execution time: 445.770 ms
  102. (5 rows)
  103. </pre><p>
  104. </p><p>
  105. Bloom is better than btree in handling this type of search:
  106. </p><pre class="programlisting">
  107. =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
  108. QUERY PLAN
  109. ---------------------------------------------------------------------------------------------------------------------------
  110. Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
  111. Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
  112. Rows Removed by Index Recheck: 2439
  113. Heap Blocks: exact=2408
  114. -&gt; Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
  115. Index Cond: ((i2 = 898732) AND (i5 = 123451))
  116. Planning time: 0.475 ms
  117. Execution time: 76.778 ms
  118. (8 rows)
  119. </pre><p>
  120. Note the relatively large number of false positives: 2439 rows were
  121. selected to be visited in the heap, but none actually matched the
  122. query. We could reduce that by specifying a larger signature length.
  123. In this example, creating the index with <code class="literal">length=200</code>
  124. reduced the number of false positives to 55; but it doubled the index size
  125. (to 306 MB) and ended up being slower for this query (125 ms overall).
  126. </p><p>
  127. Now, the main problem with the btree search is that btree is inefficient
  128. when the search conditions do not constrain the leading index column(s).
  129. A better strategy for btree is to create a separate index on each column.
  130. Then the planner will choose something like this:
  131. </p><pre class="programlisting">
  132. =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
  133. QUERY PLAN
  134. ------------------------------------------------------------------------------------------------------------------------------
  135. Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148..0.148 rows=0 loops=1)
  136. Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
  137. -&gt; BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145..0.145 rows=0 loops=1)
  138. -&gt; Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089..0.089 rows=10 loops=1)
  139. Index Cond: (i5 = 123451)
  140. -&gt; Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1)
  141. Index Cond: (i2 = 898732)
  142. Planning time: 2.049 ms
  143. Execution time: 0.280 ms
  144. (9 rows)
  145. </pre><p>
  146. Although this query runs much faster than with either of the single
  147. indexes, we pay a large penalty in index size. Each of the single-column
  148. btree indexes occupies 214 MB, so the total space needed is over 1.2GB,
  149. more than 8 times the space used by the bloom index.
  150. </p></div><div class="sect2" id="id-1.11.7.14.9"><div class="titlepage"><div><div><h3 class="title">F.5.3. Operator Class Interface</h3></div></div></div><p>
  151. An operator class for bloom indexes requires only a hash function for the
  152. indexed data type and an equality operator for searching. This example
  153. shows the operator class definition for the <code class="type">text</code> data type:
  154. </p><pre class="programlisting">
  155. CREATE OPERATOR CLASS text_ops
  156. DEFAULT FOR TYPE text USING bloom AS
  157. OPERATOR 1 =(text, text),
  158. FUNCTION 1 hashtext(text);
  159. </pre></div><div class="sect2" id="id-1.11.7.14.10"><div class="titlepage"><div><div><h3 class="title">F.5.4. Limitations</h3></div></div></div><p>
  160. </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
  161. Only operator classes for <code class="type">int4</code> and <code class="type">text</code> are
  162. included with the module.
  163. </p></li><li class="listitem"><p>
  164. Only the <code class="literal">=</code> operator is supported for search. But
  165. it is possible to add support for arrays with union and intersection
  166. operations in the future.
  167. </p></li><li class="listitem"><p>
  168. <code class="literal">bloom</code> access method doesn't support
  169. <code class="literal">UNIQUE</code> indexes.
  170. </p></li><li class="listitem"><p>
  171. <code class="literal">bloom</code> access method doesn't support searching for
  172. <code class="literal">NULL</code> values.
  173. </p></li></ul></div><p>
  174. </p></div><div class="sect2" id="id-1.11.7.14.11"><div class="titlepage"><div><div><h3 class="title">F.5.5. Authors</h3></div></div></div><p>
  175. Teodor Sigaev <code class="email">&lt;<a class="email" href="mailto:teodor@postgrespro.ru">teodor@postgrespro.ru</a>&gt;</code>,
  176. Postgres Professional, Moscow, Russia
  177. </p><p>
  178. Alexander Korotkov <code class="email">&lt;<a class="email" href="mailto:a.korotkov@postgrespro.ru">a.korotkov@postgrespro.ru</a>&gt;</code>,
  179. Postgres Professional, Moscow, Russia
  180. </p><p>
  181. Oleg Bartunov <code class="email">&lt;<a class="email" href="mailto:obartunov@postgrespro.ru">obartunov@postgrespro.ru</a>&gt;</code>,
  182. Postgres Professional, Moscow, Russia
  183. </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="auto-explain.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="btree-gin.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">F.4. auto_explain </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> F.6. btree_gin</td></tr></table></div></body></html>
上海开阖软件有限公司 沪ICP备12045867号-1