|
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>F.5. bloom</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="auto-explain.html" title="F.4. auto_explain" /><link rel="next" href="btree-gin.html" title="F.6. btree_gin" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">F.5. bloom</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="auto-explain.html" title="F.4. auto_explain">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="contrib.html" title="Appendix F. Additional Supplied Modules">Up</a></td><th width="60%" align="center">Appendix F. Additional Supplied Modules</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="btree-gin.html" title="F.6. btree_gin">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="BLOOM"><div class="titlepage"><div><div><h2 class="title" style="clear: both">F.5. bloom</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.7">F.5.1. Parameters</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.8">F.5.2. Examples</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.9">F.5.3. Operator Class Interface</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.10">F.5.4. Limitations</a></span></dt><dt><span class="sect2"><a href="bloom.html#id-1.11.7.14.11">F.5.5. Authors</a></span></dt></dl></div><a id="id-1.11.7.14.2" class="indexterm"></a><p>
- <code class="literal">bloom</code> provides an index access method based on
- <a class="ulink" href="https://en.wikipedia.org/wiki/Bloom_filter" target="_top">Bloom filters</a>.
- </p><p>
- A Bloom filter is a space-efficient data structure that is used to test
- whether an element is a member of a set. In the case of an index access
- method, it allows fast exclusion of non-matching tuples via signatures
- whose size is determined at index creation.
- </p><p>
- A signature is a lossy representation of the indexed attribute(s), and as
- such is prone to reporting false positives; that is, it may be reported
- that an element is in the set, when it is not. So index search results
- must always be rechecked using the actual attribute values from the heap
- entry. Larger signatures reduce the odds of a false positive and thus
- reduce the number of useless heap visits, but of course also make the index
- larger and hence slower to scan.
- </p><p>
- This type of index is most useful when a table has many attributes and
- queries test arbitrary combinations of them. A traditional btree index is
- faster than a bloom index, but it can require many btree indexes to support
- all possible queries where one needs only a single bloom index. Note
- however that bloom indexes only support equality queries, whereas btree
- indexes can also perform inequality and range searches.
- </p><div class="sect2" id="id-1.11.7.14.7"><div class="titlepage"><div><div><h3 class="title">F.5.1. Parameters</h3></div></div></div><p>
- A <code class="literal">bloom</code> index accepts the following parameters in its
- <code class="literal">WITH</code> clause:
- </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">length</code></span></dt><dd><p>
- Length of each signature (index entry) in bits. It is rounded up to the
- nearest multiple of <code class="literal">16</code>. The default is
- <code class="literal">80</code> bits and the maximum is <code class="literal">4096</code>.
- </p></dd></dl></div><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="literal">col1 — col32</code></span></dt><dd><p>
- Number of bits generated for each index column. Each parameter's name
- refers to the number of the index column that it controls. The default
- is <code class="literal">2</code> bits and the maximum is <code class="literal">4095</code>.
- Parameters for index columns not actually used are ignored.
- </p></dd></dl></div></div><div class="sect2" id="id-1.11.7.14.8"><div class="titlepage"><div><div><h3 class="title">F.5.2. Examples</h3></div></div></div><p>
- This is an example of creating a bloom index:
- </p><pre class="programlisting">
- CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
- WITH (length=80, col1=2, col2=2, col3=4);
- </pre><p>
- The index is created with a signature length of 80 bits, with attributes
- i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could
- have omitted the <code class="literal">length</code>, <code class="literal">col1</code>,
- and <code class="literal">col2</code> specifications since those have the default values.
- </p><p>
- Here is a more complete example of bloom index definition and usage, as
- well as a comparison with equivalent btree indexes. The bloom index is
- considerably smaller than the btree index, and can perform better.
- </p><pre class="programlisting">
- =# CREATE TABLE tbloom AS
- SELECT
- (random() * 1000000)::int as i1,
- (random() * 1000000)::int as i2,
- (random() * 1000000)::int as i3,
- (random() * 1000000)::int as i4,
- (random() * 1000000)::int as i5,
- (random() * 1000000)::int as i6
- FROM
- generate_series(1,10000000);
- SELECT 10000000
- =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
- CREATE INDEX
- =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
- pg_size_pretty
- ----------------
- 153 MB
- (1 row)
- =# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
- CREATE INDEX
- =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
- pg_size_pretty
- ----------------
- 387 MB
- (1 row)
- </pre><p>
- A sequential scan over this large table takes a long time:
- </p><pre class="programlisting">
- =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
- QUERY PLAN
- ------------------------------------------------------------------------------------------------------------
- Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
- Filter: ((i2 = 898732) AND (i5 = 123451))
- Rows Removed by Filter: 10000000
- Planning time: 0.177 ms
- Execution time: 1445.473 ms
- (5 rows)
- </pre><p>
- </p><p>
- So the planner will usually select an index scan if possible.
- With a btree index, we get results like this:
- </p><pre class="programlisting">
- =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
- QUERY PLAN
- --------------------------------------------------------------------------------------------------------------------------------
- Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
- Index Cond: ((i2 = 898732) AND (i5 = 123451))
- Heap Fetches: 0
- Planning time: 0.193 ms
- Execution time: 445.770 ms
- (5 rows)
- </pre><p>
- </p><p>
- Bloom is better than btree in handling this type of search:
- </p><pre class="programlisting">
- =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
- QUERY PLAN
- ---------------------------------------------------------------------------------------------------------------------------
- Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
- Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
- Rows Removed by Index Recheck: 2439
- Heap Blocks: exact=2408
- -> Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
- Index Cond: ((i2 = 898732) AND (i5 = 123451))
- Planning time: 0.475 ms
- Execution time: 76.778 ms
- (8 rows)
- </pre><p>
- Note the relatively large number of false positives: 2439 rows were
- selected to be visited in the heap, but none actually matched the
- query. We could reduce that by specifying a larger signature length.
- In this example, creating the index with <code class="literal">length=200</code>
- reduced the number of false positives to 55; but it doubled the index size
- (to 306 MB) and ended up being slower for this query (125 ms overall).
- </p><p>
- Now, the main problem with the btree search is that btree is inefficient
- when the search conditions do not constrain the leading index column(s).
- A better strategy for btree is to create a separate index on each column.
- Then the planner will choose something like this:
- </p><pre class="programlisting">
- =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
- QUERY PLAN
- ------------------------------------------------------------------------------------------------------------------------------
- Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148..0.148 rows=0 loops=1)
- Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
- -> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145..0.145 rows=0 loops=1)
- -> Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089..0.089 rows=10 loops=1)
- Index Cond: (i5 = 123451)
- -> Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1)
- Index Cond: (i2 = 898732)
- Planning time: 2.049 ms
- Execution time: 0.280 ms
- (9 rows)
- </pre><p>
- Although this query runs much faster than with either of the single
- indexes, we pay a large penalty in index size. Each of the single-column
- btree indexes occupies 214 MB, so the total space needed is over 1.2GB,
- more than 8 times the space used by the bloom index.
- </p></div><div class="sect2" id="id-1.11.7.14.9"><div class="titlepage"><div><div><h3 class="title">F.5.3. Operator Class Interface</h3></div></div></div><p>
- An operator class for bloom indexes requires only a hash function for the
- indexed data type and an equality operator for searching. This example
- shows the operator class definition for the <code class="type">text</code> data type:
- </p><pre class="programlisting">
- CREATE OPERATOR CLASS text_ops
- DEFAULT FOR TYPE text USING bloom AS
- OPERATOR 1 =(text, text),
- FUNCTION 1 hashtext(text);
- </pre></div><div class="sect2" id="id-1.11.7.14.10"><div class="titlepage"><div><div><h3 class="title">F.5.4. Limitations</h3></div></div></div><p>
- </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
- Only operator classes for <code class="type">int4</code> and <code class="type">text</code> are
- included with the module.
- </p></li><li class="listitem"><p>
- Only the <code class="literal">=</code> operator is supported for search. But
- it is possible to add support for arrays with union and intersection
- operations in the future.
- </p></li><li class="listitem"><p>
- <code class="literal">bloom</code> access method doesn't support
- <code class="literal">UNIQUE</code> indexes.
- </p></li><li class="listitem"><p>
- <code class="literal">bloom</code> access method doesn't support searching for
- <code class="literal">NULL</code> values.
- </p></li></ul></div><p>
- </p></div><div class="sect2" id="id-1.11.7.14.11"><div class="titlepage"><div><div><h3 class="title">F.5.5. Authors</h3></div></div></div><p>
- Teodor Sigaev <code class="email"><<a class="email" href="mailto:teodor@postgrespro.ru">teodor@postgrespro.ru</a>></code>,
- Postgres Professional, Moscow, Russia
- </p><p>
- Alexander Korotkov <code class="email"><<a class="email" href="mailto:a.korotkov@postgrespro.ru">a.korotkov@postgrespro.ru</a>></code>,
- Postgres Professional, Moscow, Russia
- </p><p>
- Oleg Bartunov <code class="email"><<a class="email" href="mailto:obartunov@postgrespro.ru">obartunov@postgrespro.ru</a>></code>,
- Postgres Professional, Moscow, Russia
- </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="auto-explain.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="contrib.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="btree-gin.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">F.4. auto_explain </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> F.6. btree_gin</td></tr></table></div></body></html>
|