|
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>7.8. WITH Queries (Common Table Expressions)</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="queries-values.html" title="7.7. VALUES Lists" /><link rel="next" href="datatype.html" title="Chapter 8. Data Types" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">7.8. <code xmlns="http://www.w3.org/1999/xhtml" class="literal">WITH</code> Queries (Common Table Expressions)</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="queries-values.html" title="7.7. VALUES Lists">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="queries.html" title="Chapter 7. Queries">Up</a></td><th width="60%" align="center">Chapter 7. Queries</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="datatype.html" title="Chapter 8. Data Types">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="QUERIES-WITH"><div class="titlepage"><div><div><h2 class="title" style="clear: both">7.8. <code class="literal">WITH</code> Queries (Common Table Expressions)</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="queries-with.html#QUERIES-WITH-SELECT">7.8.1. <code class="command">SELECT</code> in <code class="literal">WITH</code></a></span></dt><dt><span class="sect2"><a href="queries-with.html#QUERIES-WITH-MODIFYING">7.8.2. Data-Modifying Statements in <code class="literal">WITH</code></a></span></dt></dl></div><a id="id-1.5.6.12.2" class="indexterm"></a><a id="id-1.5.6.12.3" class="indexterm"></a><p>
- <code class="literal">WITH</code> provides a way to write auxiliary statements for use in a
- larger query. These statements, which are often referred to as Common
- Table Expressions or <acronym class="acronym">CTE</acronym>s, can be thought of as defining
- temporary tables that exist just for one query. Each auxiliary statement
- in a <code class="literal">WITH</code> clause can be a <code class="command">SELECT</code>,
- <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or <code class="command">DELETE</code>; and the
- <code class="literal">WITH</code> clause itself is attached to a primary statement that can
- also be a <code class="command">SELECT</code>, <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or
- <code class="command">DELETE</code>.
- </p><div class="sect2" id="QUERIES-WITH-SELECT"><div class="titlepage"><div><div><h3 class="title">7.8.1. <code class="command">SELECT</code> in <code class="literal">WITH</code></h3></div></div></div><p>
- The basic value of <code class="command">SELECT</code> in <code class="literal">WITH</code> is to
- break down complicated queries into simpler parts. An example is:
-
- </p><pre class="programlisting">
- WITH regional_sales AS (
- SELECT region, SUM(amount) AS total_sales
- FROM orders
- GROUP BY region
- ), top_regions AS (
- SELECT region
- FROM regional_sales
- WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
- )
- SELECT region,
- product,
- SUM(quantity) AS product_units,
- SUM(amount) AS product_sales
- FROM orders
- WHERE region IN (SELECT region FROM top_regions)
- GROUP BY region, product;
- </pre><p>
-
- which displays per-product sales totals in only the top sales regions.
- The <code class="literal">WITH</code> clause defines two auxiliary statements named
- <code class="structname">regional_sales</code> and <code class="structname">top_regions</code>,
- where the output of <code class="structname">regional_sales</code> is used in
- <code class="structname">top_regions</code> and the output of <code class="structname">top_regions</code>
- is used in the primary <code class="command">SELECT</code> query.
- This example could have been written without <code class="literal">WITH</code>,
- but we'd have needed two levels of nested sub-<code class="command">SELECT</code>s. It's a bit
- easier to follow this way.
- </p><p>
- <a id="id-1.5.6.12.5.3.1" class="indexterm"></a>
- The optional <code class="literal">RECURSIVE</code> modifier changes <code class="literal">WITH</code>
- from a mere syntactic convenience into a feature that accomplishes
- things not otherwise possible in standard SQL. Using
- <code class="literal">RECURSIVE</code>, a <code class="literal">WITH</code> query can refer to its own
- output. A very simple example is this query to sum the integers from 1
- through 100:
-
- </p><pre class="programlisting">
- WITH RECURSIVE t(n) AS (
- VALUES (1)
- UNION ALL
- SELECT n+1 FROM t WHERE n < 100
- )
- SELECT sum(n) FROM t;
- </pre><p>
-
- The general form of a recursive <code class="literal">WITH</code> query is always a
- <em class="firstterm">non-recursive term</em>, then <code class="literal">UNION</code> (or
- <code class="literal">UNION ALL</code>), then a
- <em class="firstterm">recursive term</em>, where only the recursive term can contain
- a reference to the query's own output. Such a query is executed as
- follows:
- </p><div class="procedure" id="id-1.5.6.12.5.4"><p class="title"><strong>Recursive Query Evaluation</strong></p><ol class="procedure" type="1"><li class="step"><p>
- Evaluate the non-recursive term. For <code class="literal">UNION</code> (but not
- <code class="literal">UNION ALL</code>), discard duplicate rows. Include all remaining
- rows in the result of the recursive query, and also place them in a
- temporary <em class="firstterm">working table</em>.
- </p></li><li class="step"><p>
- So long as the working table is not empty, repeat these steps:
- </p><ol type="a" class="substeps"><li class="step"><p>
- Evaluate the recursive term, substituting the current contents of
- the working table for the recursive self-reference.
- For <code class="literal">UNION</code> (but not <code class="literal">UNION ALL</code>), discard
- duplicate rows and rows that duplicate any previous result row.
- Include all remaining rows in the result of the recursive query, and
- also place them in a temporary <em class="firstterm">intermediate table</em>.
- </p></li><li class="step"><p>
- Replace the contents of the working table with the contents of the
- intermediate table, then empty the intermediate table.
- </p></li></ol></li></ol></div><div class="note"><h3 class="title">Note</h3><p>
- Strictly speaking, this process is iteration not recursion, but
- <code class="literal">RECURSIVE</code> is the terminology chosen by the SQL standards
- committee.
- </p></div><p>
- In the example above, the working table has just a single row in each step,
- and it takes on the values from 1 through 100 in successive steps. In
- the 100th step, there is no output because of the <code class="literal">WHERE</code>
- clause, and so the query terminates.
- </p><p>
- Recursive queries are typically used to deal with hierarchical or
- tree-structured data. A useful example is this query to find all the
- direct and indirect sub-parts of a product, given only a table that
- shows immediate inclusions:
-
- </p><pre class="programlisting">
- WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
- SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
- UNION ALL
- SELECT p.sub_part, p.part, p.quantity
- FROM included_parts pr, parts p
- WHERE p.part = pr.sub_part
- )
- SELECT sub_part, SUM(quantity) as total_quantity
- FROM included_parts
- GROUP BY sub_part
- </pre><p>
- </p><p>
- When working with recursive queries it is important to be sure that
- the recursive part of the query will eventually return no tuples,
- or else the query will loop indefinitely. Sometimes, using
- <code class="literal">UNION</code> instead of <code class="literal">UNION ALL</code> can accomplish this
- by discarding rows that duplicate previous output rows. However, often a
- cycle does not involve output rows that are completely duplicate: it may be
- necessary to check just one or a few fields to see if the same point has
- been reached before. The standard method for handling such situations is
- to compute an array of the already-visited values. For example, consider
- the following query that searches a table <code class="structname">graph</code> using a
- <code class="structfield">link</code> field:
-
- </p><pre class="programlisting">
- WITH RECURSIVE search_graph(id, link, data, depth) AS (
- SELECT g.id, g.link, g.data, 1
- FROM graph g
- UNION ALL
- SELECT g.id, g.link, g.data, sg.depth + 1
- FROM graph g, search_graph sg
- WHERE g.id = sg.link
- )
- SELECT * FROM search_graph;
- </pre><p>
-
- This query will loop if the <code class="structfield">link</code> relationships contain
- cycles. Because we require a <span class="quote">“<span class="quote">depth</span>”</span> output, just changing
- <code class="literal">UNION ALL</code> to <code class="literal">UNION</code> would not eliminate the looping.
- Instead we need to recognize whether we have reached the same row again
- while following a particular path of links. We add two columns
- <code class="structfield">path</code> and <code class="structfield">cycle</code> to the loop-prone query:
-
- </p><pre class="programlisting">
- WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
- SELECT g.id, g.link, g.data, 1,
- ARRAY[g.id],
- false
- FROM graph g
- UNION ALL
- SELECT g.id, g.link, g.data, sg.depth + 1,
- path || g.id,
- g.id = ANY(path)
- FROM graph g, search_graph sg
- WHERE g.id = sg.link AND NOT cycle
- )
- SELECT * FROM search_graph;
- </pre><p>
-
- Aside from preventing cycles, the array value is often useful in its own
- right as representing the <span class="quote">“<span class="quote">path</span>”</span> taken to reach any particular row.
- </p><p>
- In the general case where more than one field needs to be checked to
- recognize a cycle, use an array of rows. For example, if we needed to
- compare fields <code class="structfield">f1</code> and <code class="structfield">f2</code>:
-
- </p><pre class="programlisting">
- WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
- SELECT g.id, g.link, g.data, 1,
- ARRAY[ROW(g.f1, g.f2)],
- false
- FROM graph g
- UNION ALL
- SELECT g.id, g.link, g.data, sg.depth + 1,
- path || ROW(g.f1, g.f2),
- ROW(g.f1, g.f2) = ANY(path)
- FROM graph g, search_graph sg
- WHERE g.id = sg.link AND NOT cycle
- )
- SELECT * FROM search_graph;
- </pre><p>
- </p><div class="tip"><h3 class="title">Tip</h3><p>
- Omit the <code class="literal">ROW()</code> syntax in the common case where only one field
- needs to be checked to recognize a cycle. This allows a simple array
- rather than a composite-type array to be used, gaining efficiency.
- </p></div><div class="tip"><h3 class="title">Tip</h3><p>
- The recursive query evaluation algorithm produces its output in
- breadth-first search order. You can display the results in depth-first
- search order by making the outer query <code class="literal">ORDER BY</code> a
- <span class="quote">“<span class="quote">path</span>”</span> column constructed in this way.
- </p></div><p>
- A helpful trick for testing queries
- when you are not certain if they might loop is to place a <code class="literal">LIMIT</code>
- in the parent query. For example, this query would loop forever without
- the <code class="literal">LIMIT</code>:
-
- </p><pre class="programlisting">
- WITH RECURSIVE t(n) AS (
- SELECT 1
- UNION ALL
- SELECT n+1 FROM t
- )
- SELECT n FROM t LIMIT 100;
- </pre><p>
-
- This works because <span class="productname">PostgreSQL</span>'s implementation
- evaluates only as many rows of a <code class="literal">WITH</code> query as are actually
- fetched by the parent query. Using this trick in production is not
- recommended, because other systems might work differently. Also, it
- usually won't work if you make the outer query sort the recursive query's
- results or join them to some other table, because in such cases the
- outer query will usually try to fetch all of the <code class="literal">WITH</code> query's
- output anyway.
- </p><p>
- A useful property of <code class="literal">WITH</code> queries is that they are
- normally evaluated only once per execution of the parent query, even if
- they are referred to more than once by the parent query or
- sibling <code class="literal">WITH</code> queries.
- Thus, expensive calculations that are needed in multiple places can be
- placed within a <code class="literal">WITH</code> query to avoid redundant work. Another
- possible application is to prevent unwanted multiple evaluations of
- functions with side-effects.
- However, the other side of this coin is that the optimizer is not able to
- push restrictions from the parent query down into a multiply-referenced
- <code class="literal">WITH</code> query, since that might affect all uses of the
- <code class="literal">WITH</code> query's output when it should affect only one.
- The multiply-referenced <code class="literal">WITH</code> query will be
- evaluated as written, without suppression of rows that the parent query
- might discard afterwards. (But, as mentioned above, evaluation might stop
- early if the reference(s) to the query demand only a limited number of
- rows.)
- </p><p>
- However, if a <code class="literal">WITH</code> query is non-recursive and
- side-effect-free (that is, it is a <code class="literal">SELECT</code> containing
- no volatile functions) then it can be folded into the parent query,
- allowing joint optimization of the two query levels. By default, this
- happens if the parent query references the <code class="literal">WITH</code> query
- just once, but not if it references the <code class="literal">WITH</code> query
- more than once. You can override that decision by
- specifying <code class="literal">MATERIALIZED</code> to force separate calculation
- of the <code class="literal">WITH</code> query, or by specifying <code class="literal">NOT
- MATERIALIZED</code> to force it to be merged into the parent query.
- The latter choice risks duplicate computation of
- the <code class="literal">WITH</code> query, but it can still give a net savings if
- each usage of the <code class="literal">WITH</code> query needs only a small part
- of the <code class="literal">WITH</code> query's full output.
- </p><p>
- A simple example of these rules is
- </p><pre class="programlisting">
- WITH w AS (
- SELECT * FROM big_table
- )
- SELECT * FROM w WHERE key = 123;
- </pre><p>
- This <code class="literal">WITH</code> query will be folded, producing the same
- execution plan as
- </p><pre class="programlisting">
- SELECT * FROM big_table WHERE key = 123;
- </pre><p>
- In particular, if there's an index on <code class="structfield">key</code>,
- it will probably be used to fetch just the rows having <code class="literal">key =
- 123</code>. On the other hand, in
- </p><pre class="programlisting">
- WITH w AS (
- SELECT * FROM big_table
- )
- SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
- WHERE w2.key = 123;
- </pre><p>
- the <code class="literal">WITH</code> query will be materialized, producing a
- temporary copy of <code class="structname">big_table</code> that is then
- joined with itself — without benefit of any index. This query
- will be executed much more efficiently if written as
- </p><pre class="programlisting">
- WITH w AS NOT MATERIALIZED (
- SELECT * FROM big_table
- )
- SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
- WHERE w2.key = 123;
- </pre><p>
- so that the parent query's restrictions can be applied directly
- to scans of <code class="structname">big_table</code>.
- </p><p>
- An example where <code class="literal">NOT MATERIALIZED</code> could be
- undesirable is
- </p><pre class="programlisting">
- WITH w AS (
- SELECT key, very_expensive_function(val) as f FROM some_table
- )
- SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f;
- </pre><p>
- Here, materialization of the <code class="literal">WITH</code> query ensures
- that <code class="function">very_expensive_function</code> is evaluated only
- once per table row, not twice.
- </p><p>
- The examples above only show <code class="literal">WITH</code> being used with
- <code class="command">SELECT</code>, but it can be attached in the same way to
- <code class="command">INSERT</code>, <code class="command">UPDATE</code>, or <code class="command">DELETE</code>.
- In each case it effectively provides temporary table(s) that can
- be referred to in the main command.
- </p></div><div class="sect2" id="QUERIES-WITH-MODIFYING"><div class="titlepage"><div><div><h3 class="title">7.8.2. Data-Modifying Statements in <code class="literal">WITH</code></h3></div></div></div><p>
- You can use data-modifying statements (<code class="command">INSERT</code>,
- <code class="command">UPDATE</code>, or <code class="command">DELETE</code>) in <code class="literal">WITH</code>. This
- allows you to perform several different operations in the same query.
- An example is:
-
- </p><pre class="programlisting">
- WITH moved_rows AS (
- DELETE FROM products
- WHERE
- "date" >= '2010-10-01' AND
- "date" < '2010-11-01'
- RETURNING *
- )
- INSERT INTO products_log
- SELECT * FROM moved_rows;
- </pre><p>
-
- This query effectively moves rows from <code class="structname">products</code> to
- <code class="structname">products_log</code>. The <code class="command">DELETE</code> in <code class="literal">WITH</code>
- deletes the specified rows from <code class="structname">products</code>, returning their
- contents by means of its <code class="literal">RETURNING</code> clause; and then the
- primary query reads that output and inserts it into
- <code class="structname">products_log</code>.
- </p><p>
- A fine point of the above example is that the <code class="literal">WITH</code> clause is
- attached to the <code class="command">INSERT</code>, not the sub-<code class="command">SELECT</code> within
- the <code class="command">INSERT</code>. This is necessary because data-modifying
- statements are only allowed in <code class="literal">WITH</code> clauses that are attached
- to the top-level statement. However, normal <code class="literal">WITH</code> visibility
- rules apply, so it is possible to refer to the <code class="literal">WITH</code>
- statement's output from the sub-<code class="command">SELECT</code>.
- </p><p>
- Data-modifying statements in <code class="literal">WITH</code> usually have
- <code class="literal">RETURNING</code> clauses (see <a class="xref" href="dml-returning.html" title="6.4. Returning Data From Modified Rows">Section 6.4</a>),
- as shown in the example above.
- It is the output of the <code class="literal">RETURNING</code> clause, <span class="emphasis"><em>not</em></span> the
- target table of the data-modifying statement, that forms the temporary
- table that can be referred to by the rest of the query. If a
- data-modifying statement in <code class="literal">WITH</code> lacks a <code class="literal">RETURNING</code>
- clause, then it forms no temporary table and cannot be referred to in
- the rest of the query. Such a statement will be executed nonetheless.
- A not-particularly-useful example is:
-
- </p><pre class="programlisting">
- WITH t AS (
- DELETE FROM foo
- )
- DELETE FROM bar;
- </pre><p>
-
- This example would remove all rows from tables <code class="structname">foo</code> and
- <code class="structname">bar</code>. The number of affected rows reported to the client
- would only include rows removed from <code class="structname">bar</code>.
- </p><p>
- Recursive self-references in data-modifying statements are not
- allowed. In some cases it is possible to work around this limitation by
- referring to the output of a recursive <code class="literal">WITH</code>, for example:
-
- </p><pre class="programlisting">
- WITH RECURSIVE included_parts(sub_part, part) AS (
- SELECT sub_part, part FROM parts WHERE part = 'our_product'
- UNION ALL
- SELECT p.sub_part, p.part
- FROM included_parts pr, parts p
- WHERE p.part = pr.sub_part
- )
- DELETE FROM parts
- WHERE part IN (SELECT part FROM included_parts);
- </pre><p>
-
- This query would remove all direct and indirect subparts of a product.
- </p><p>
- Data-modifying statements in <code class="literal">WITH</code> are executed exactly once,
- and always to completion, independently of whether the primary query
- reads all (or indeed any) of their output. Notice that this is different
- from the rule for <code class="command">SELECT</code> in <code class="literal">WITH</code>: as stated in the
- previous section, execution of a <code class="command">SELECT</code> is carried only as far
- as the primary query demands its output.
- </p><p>
- The sub-statements in <code class="literal">WITH</code> are executed concurrently with
- each other and with the main query. Therefore, when using data-modifying
- statements in <code class="literal">WITH</code>, the order in which the specified updates
- actually happen is unpredictable. All the statements are executed with
- the same <em class="firstterm">snapshot</em> (see <a class="xref" href="mvcc.html" title="Chapter 13. Concurrency Control">Chapter 13</a>), so they
- cannot <span class="quote">“<span class="quote">see</span>”</span> one another's effects on the target tables. This
- alleviates the effects of the unpredictability of the actual order of row
- updates, and means that <code class="literal">RETURNING</code> data is the only way to
- communicate changes between different <code class="literal">WITH</code> sub-statements and
- the main query. An example of this is that in
-
- </p><pre class="programlisting">
- WITH t AS (
- UPDATE products SET price = price * 1.05
- RETURNING *
- )
- SELECT * FROM products;
- </pre><p>
-
- the outer <code class="command">SELECT</code> would return the original prices before the
- action of the <code class="command">UPDATE</code>, while in
-
- </p><pre class="programlisting">
- WITH t AS (
- UPDATE products SET price = price * 1.05
- RETURNING *
- )
- SELECT * FROM t;
- </pre><p>
-
- the outer <code class="command">SELECT</code> would return the updated data.
- </p><p>
- Trying to update the same row twice in a single statement is not
- supported. Only one of the modifications takes place, but it is not easy
- (and sometimes not possible) to reliably predict which one. This also
- applies to deleting a row that was already updated in the same statement:
- only the update is performed. Therefore you should generally avoid trying
- to modify a single row twice in a single statement. In particular avoid
- writing <code class="literal">WITH</code> sub-statements that could affect the same rows
- changed by the main statement or a sibling sub-statement. The effects
- of such a statement will not be predictable.
- </p><p>
- At present, any table used as the target of a data-modifying statement in
- <code class="literal">WITH</code> must not have a conditional rule, nor an <code class="literal">ALSO</code>
- rule, nor an <code class="literal">INSTEAD</code> rule that expands to multiple statements.
- </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="queries-values.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="queries.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="datatype.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">7.7. <code class="literal">VALUES</code> Lists </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 8. Data Types</td></tr></table></div></body></html>
|