Query Optimization Technology for Correlated Subqueries

Background

-- Information of all customers who have never ordered
Select c_custkey
from
customer
where
not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
-- Information of all customers who have never ordered
select c_custkey
from
customer
left join (
select
distinct o_custkey
from
orders
) on o_custkey = c_custkey
where
o_custkey is null
-- tpch q17
SELECT Sum(l1.extendedprice) / 7.0 AS avg_yearly
FROM lineitem l1,
part p
WHERE p.partkey = l1.partkey
AND p.brand = 'Brand#44'
AND p.container = 'WRAP PKG'
AND l1.quantity < (SELECT 0.2 * Avg(l2.quantity)
FROM lineitem l2
WHERE l2.partkey = p.partkey);
-- Customer and corresponding total consumption
select
c_custkey,
(
select sum(o_totalprice)
from
orders
where o_custkey = c_custkey
)
from
customer
select
c_custkey,
(
select
sum(o_totalprice)
from
orders
where
o_custkey = c_custkey
and '2020-05-27' > (
select
max(l_receiptdate)
from
lineitem
where
l_orderkey = o_orderkey
)

from
customer
SELECT Sum(l1.extendedprice) / 7.0 AS avg_yearly 
FROM lineitem l1,
part p
WHERE p.partkey = l1.partkey
AND p.brand = 'Brand#44'
AND p.container = 'WRAP PKG'
AND l1.quantity < (SELECT 0.2 * Avg(l2.quantity)
FROM lineitem l2
WHERE l2.partkey = p.partkey);
SELECT 0.2 * Avg(l2.quantity) 
FROM lineitem l2
WHERE l2.partkey = p.partkey -- p.partkey is a column of external query
+---------------+
| Plan Details |
+---------------+
1- Output[avg_yearly] avg_yearly := expr
2 -> Project[] expr := (`sum` / DOUBLE '7.0')
3 - Aggregate sum := `sum`(`extendedprice`)
4 -> Filter[p.`partkey` = l1.`partkey` AND `brand` = 'Brand#51' AND `container` = 'WRAP PACK' AND `quantity` < `result`]
5 - CorrelatedJoin[[p.`partkey`]]
6 - CrossJoin
7 - TableScan[tpch:lineitem l1]
8 - TableScan[tpch:part p]
9 - Scalar
10 -> Project[] result := (DOUBLE '0.2' * `avg`)
11 - Aggregate avg := `avg`(`quantity`)
12 -> Filter[(p.`partkey` = l2`partkey`)]
13 - TableScan[tpch:lineitem l2]
-- When p.partkey = 25, the corresponding subquery is
SELECT 0.2 * Avg(l2.quantity)
FROM lineitem l2
WHERE l2.partkey = 25

Common Decorrelation Methods

  1. The correlated join operator adds a column to the external query result.
  2. The result of the added column is obtained by computing the value of the correlated column of the external query in the subquery. If more than one row of computing results is added, an error is reported. Null computing result should also be added.
  3. Unlike the join operator, the correlated join operator does not change other columns of the external query, that is, the row quantity is not changed.

Pushdown Rules

Reuse the Results

  1. New external queries always have keys since the distinct operation has been performed.
  2. The correlatedjoin* operator filters the external query columns, so its pushdown is simpler, which does not require the assign unique id or retain all rows.

Optimization of Correlated Subqueries

Correlated Subqueries of exists/in/filter

Lifting of Correlated Conditions

SELECT t1.c2
FROM
t1
WHERE t1.c2 < (
SELECT 0.2 * max(t2.x)
FROM
t2
WHERE t2.c1 = t2.c1
GROUP BY t2.c1
);

Cost-related Subquery Optimization

  1. If the private key does not indicate that the agg function is sensitive to duplicates, the window operator will block the filter-join pushdown. This interrupts joingraph, thus increasing the intermediate result of join.
  2. For rewriting as a join with two subtrees, the filter-join can be pushed down to one of the subtrees. However, the filter-join cannot be pushed down after rewriting as window.
  3. If the pipeline execution model or cte is used, table scanning produces limited benefits.
  4. The optimization processing of traditional optimizers for join and agg are much better than for window and the optimization rules are much richer.

Use of Equivalent Columns

Select t1.c2
From
t1
Where
t1.c3 < (
Select min(t2.c3)
From t2
Where t1.c1 = t2.c1
group by t1.c1
)
-- Replace t1.ct with t2.c1 in subquery for simplificationSelect t1.c2
From
t1
Where
t1.c3 < (
Select min(t2.c3)
From t2
Where t1.c1 = t2.c1
group by t2.c1
)

Subquery Related Optimization Rules

  1. The row number after the correlatedjoin operator is the same as that in the left subtree.
  2. The output of enforcesinglerow is one row.
  3. The correlated column of the external query determines the newly added output column of correlatedjoin. This is called function dependency.
  4. The key generated by the assign unique id has unique attributes and can be used for subsequent simplification of aggregation and group by.
  5. The meaningless sort operations in the subquery can be filtered.
  1. There is no need to add the uniqueid column to the originally unique external query.
  2. Tailoring is allowed when the output of enforcesinglerow sub-node is always one row.
  3. Some subqueries with the correlated column on the project can be rewritten as exists subqueries in some cases.
sql select t1.orderkey, ( select min(t1.orderkey) from orders t2 where t2.orderkey > t1.orderkey ) from orders t1 order by 1

Notes:

Select t1.c2
From
t1
Where
t1.c3 < (
Select COUNT (*)
From t2
Where t1.c1 = t2.c1
)
select t1.c1 
in (
select
t2.c1
from
t2)
from
t1
-- sql 1
SELECT t1.c1
FROM t1
WHERE t1.c2 > (
SELECT min(t2.c2)
FROM t2
WHERE t2.c1 = t1.c1
);
-- sql 2
SELECT t1.c1
FROM t1
WHERE t1.c2 > (
SELECT min(t2.c2)
FROM t2
WHERE t2.c1 > t1.c1
);

References

  1. Neumann, Thomas, and Alfons Kemper. “Unnesting arbitrary queries.” Datenbanksysteme für Business, Technologie und Web (BTW 2015) (2015).
  2. Neumann, Thomas, Viktor Leis, and Alfons Kemper. “The Complete Story of Joins (inHyPer).” Datenbanksysteme für Business, Technologie und Web (BTW 2017) (2017).
  3. Zuzarte, Calisto, et al. “Winmagic: Subquery elimination using window aggregation.” Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 2003.
  4. Bellamkonda, Srikanth, et al. “Enhanced subquery optimizations in oracle.” Proceedings of the VLDB Endowment 2.2 (2009): 1366–1377
  5. Galindo-Legaria, César, and Milind Joshi. “Orthogonal optimization of subqueries and aggregation.” ACM SIGMOD Record 30.2 (2001): 571–581.
  6. Galindo-Legaria, C. A. “Parameterized queries and nesting equivalences.” Technical report, Microsoft, 2001. MSR-TR-2000–31, 2000.

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com