|
- <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>14.3. Controlling the Planner with Explicit JOIN Clauses</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets V1.79.1" /><link rel="prev" href="planner-stats.html" title="14.2. Statistics Used by the Planner" /><link rel="next" href="populate.html" title="14.4. Populating a Database" /></head><body><div xmlns="http://www.w3.org/TR/xhtml1/transitional" class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">14.3. Controlling the Planner with Explicit <code xmlns="http://www.w3.org/1999/xhtml" class="literal">JOIN</code> Clauses</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="planner-stats.html" title="14.2. Statistics Used by the Planner">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="performance-tips.html" title="Chapter 14. Performance Tips">Up</a></td><th width="60%" align="center">Chapter 14. Performance Tips</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 12.4 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="populate.html" title="14.4. Populating a Database">Next</a></td></tr></table><hr></hr></div><div class="sect1" id="EXPLICIT-JOINS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">14.3. Controlling the Planner with Explicit <code class="literal">JOIN</code> Clauses</h2></div></div></div><a id="id-1.5.13.6.2" class="indexterm"></a><p>
- It is possible
- to control the query planner to some extent by using the explicit <code class="literal">JOIN</code>
- syntax. To see why this matters, we first need some background.
- </p><p>
- In a simple join query, such as:
- </p><pre class="programlisting">
- SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
- </pre><p>
- the planner is free to join the given tables in any order. For
- example, it could generate a query plan that joins A to B, using
- the <code class="literal">WHERE</code> condition <code class="literal">a.id = b.id</code>, and then
- joins C to this joined table, using the other <code class="literal">WHERE</code>
- condition. Or it could join B to C and then join A to that result.
- Or it could join A to C and then join them with B — but that
- would be inefficient, since the full Cartesian product of A and C
- would have to be formed, there being no applicable condition in the
- <code class="literal">WHERE</code> clause to allow optimization of the join. (All
- joins in the <span class="productname">PostgreSQL</span> executor happen
- between two input tables, so it's necessary to build up the result
- in one or another of these fashions.) The important point is that
- these different join possibilities give semantically equivalent
- results but might have hugely different execution costs. Therefore,
- the planner will explore all of them to try to find the most
- efficient query plan.
- </p><p>
- When a query only involves two or three tables, there aren't many join
- orders to worry about. But the number of possible join orders grows
- exponentially as the number of tables expands. Beyond ten or so input
- tables it's no longer practical to do an exhaustive search of all the
- possibilities, and even for six or seven tables planning might take an
- annoyingly long time. When there are too many input tables, the
- <span class="productname">PostgreSQL</span> planner will switch from exhaustive
- search to a <em class="firstterm">genetic</em> probabilistic search
- through a limited number of possibilities. (The switch-over threshold is
- set by the <a class="xref" href="runtime-config-query.html#GUC-GEQO-THRESHOLD">geqo_threshold</a> run-time
- parameter.)
- The genetic search takes less time, but it won't
- necessarily find the best possible plan.
- </p><p>
- When the query involves outer joins, the planner has less freedom
- than it does for plain (inner) joins. For example, consider:
- </p><pre class="programlisting">
- SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
- </pre><p>
- Although this query's restrictions are superficially similar to the
- previous example, the semantics are different because a row must be
- emitted for each row of A that has no matching row in the join of B and C.
- Therefore the planner has no choice of join order here: it must join
- B to C and then join A to that result. Accordingly, this query takes
- less time to plan than the previous query. In other cases, the planner
- might be able to determine that more than one join order is safe.
- For example, given:
- </p><pre class="programlisting">
- SELECT * FROM a LEFT JOIN b ON (a.bid = b.id) LEFT JOIN c ON (a.cid = c.id);
- </pre><p>
- it is valid to join A to either B or C first. Currently, only
- <code class="literal">FULL JOIN</code> completely constrains the join order. Most
- practical cases involving <code class="literal">LEFT JOIN</code> or <code class="literal">RIGHT JOIN</code>
- can be rearranged to some extent.
- </p><p>
- Explicit inner join syntax (<code class="literal">INNER JOIN</code>, <code class="literal">CROSS
- JOIN</code>, or unadorned <code class="literal">JOIN</code>) is semantically the same as
- listing the input relations in <code class="literal">FROM</code>, so it does not
- constrain the join order.
- </p><p>
- Even though most kinds of <code class="literal">JOIN</code> don't completely constrain
- the join order, it is possible to instruct the
- <span class="productname">PostgreSQL</span> query planner to treat all
- <code class="literal">JOIN</code> clauses as constraining the join order anyway.
- For example, these three queries are logically equivalent:
- </p><pre class="programlisting">
- SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
- SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
- SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
- </pre><p>
- But if we tell the planner to honor the <code class="literal">JOIN</code> order,
- the second and third take less time to plan than the first. This effect
- is not worth worrying about for only three tables, but it can be a
- lifesaver with many tables.
- </p><p>
- To force the planner to follow the join order laid out by explicit
- <code class="literal">JOIN</code>s,
- set the <a class="xref" href="runtime-config-query.html#GUC-JOIN-COLLAPSE-LIMIT">join_collapse_limit</a> run-time parameter to 1.
- (Other possible values are discussed below.)
- </p><p>
- You do not need to constrain the join order completely in order to
- cut search time, because it's OK to use <code class="literal">JOIN</code> operators
- within items of a plain <code class="literal">FROM</code> list. For example, consider:
- </p><pre class="programlisting">
- SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
- </pre><p>
- With <code class="varname">join_collapse_limit</code> = 1, this
- forces the planner to join A to B before joining them to other tables,
- but doesn't constrain its choices otherwise. In this example, the
- number of possible join orders is reduced by a factor of 5.
- </p><p>
- Constraining the planner's search in this way is a useful technique
- both for reducing planning time and for directing the planner to a
- good query plan. If the planner chooses a bad join order by default,
- you can force it to choose a better order via <code class="literal">JOIN</code> syntax
- — assuming that you know of a better order, that is. Experimentation
- is recommended.
- </p><p>
- A closely related issue that affects planning time is collapsing of
- subqueries into their parent query. For example, consider:
- </p><pre class="programlisting">
- SELECT *
- FROM x, y,
- (SELECT * FROM a, b, c WHERE something) AS ss
- WHERE somethingelse;
- </pre><p>
- This situation might arise from use of a view that contains a join;
- the view's <code class="literal">SELECT</code> rule will be inserted in place of the view
- reference, yielding a query much like the above. Normally, the planner
- will try to collapse the subquery into the parent, yielding:
- </p><pre class="programlisting">
- SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
- </pre><p>
- This usually results in a better plan than planning the subquery
- separately. (For example, the outer <code class="literal">WHERE</code> conditions might be such that
- joining X to A first eliminates many rows of A, thus avoiding the need to
- form the full logical output of the subquery.) But at the same time,
- we have increased the planning time; here, we have a five-way join
- problem replacing two separate three-way join problems. Because of the
- exponential growth of the number of possibilities, this makes a big
- difference. The planner tries to avoid getting stuck in huge join search
- problems by not collapsing a subquery if more than <code class="varname">from_collapse_limit</code>
- <code class="literal">FROM</code> items would result in the parent
- query. You can trade off planning time against quality of plan by
- adjusting this run-time parameter up or down.
- </p><p>
- <a class="xref" href="runtime-config-query.html#GUC-FROM-COLLAPSE-LIMIT">from_collapse_limit</a> and <a class="xref" href="runtime-config-query.html#GUC-JOIN-COLLAPSE-LIMIT">join_collapse_limit</a>
- are similarly named because they do almost the same thing: one controls
- when the planner will <span class="quote">“<span class="quote">flatten out</span>”</span> subqueries, and the
- other controls when it will flatten out explicit joins. Typically
- you would either set <code class="varname">join_collapse_limit</code> equal to
- <code class="varname">from_collapse_limit</code> (so that explicit joins and subqueries
- act similarly) or set <code class="varname">join_collapse_limit</code> to 1 (if you want
- to control join order with explicit joins). But you might set them
- differently if you are trying to fine-tune the trade-off between planning
- time and run time.
- </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="planner-stats.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="performance-tips.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="populate.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">14.2. Statistics Used by the Planner </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> 14.4. Populating a Database</td></tr></table></div></body></html>
|