Estimate Actual Operator 1 1 SEQUENCE 6029 6029 ├─AGGREGATE 0.2 *AVG(l_quantity)
186899 189973 │TABLESEEK lineitem AS li
1 1 └─AGGREGATESUM(l_extendedprice) / 7.0
2000 537 INNERJOINHASHON l_partkey = p_partkey AND (l_quantity < (SubPlan 1))
249 198 │└TABLESCAN part AS part WHERE (p_brand = 'Brand#13') AND (p_container = 'MED CAN')
6001215 6001215 TABLESCAN lineitem AS lineitem
Trino
Estimate Actual Operator 1 1 PROJECT expr_18 AS avg_yearly
1 1 PROJECT sum / 7.0 AS expr_18
1 1 DISTRIBUTE ROUND ROBIN
1 1 AGGREGATESUM(sum_22) AS sum
- 1 DISTRIBUTE GATHER
- 1 AGGREGATESUM(l_extendedprice) AS sum_22
- 21150 FILTER l_quantity < (COALESCE(avg,avg_21) * 0.2)
240048 239307 CROSSJOINLOOP 240048 239307 │└LEFTOUTERJOINHASHON l_partkey = l_partkey_3
200000 200000 ││└DISTRIBUTEHASHON l_partkey_3
200000 200000 ││AGGREGATEAVG(avg_23) AS avg GROUPBYHASH l_partkey_3
6001215 200000 ││DISTRIBUTEHASHON l_partkey_3
6001215 200000 ││AGGREGATEAVG(l_quantity_6) AS avg_23 GROUPBY PARTIAL l_partkey_3
- 6001215 ││FILTER non_null
6001215 6001215 ││PROJECT l_quantity AS l_quantity_6, l_partkey AS l_partkey_3, true AS non_null
6001215 6001215 ││TABLESCAN lineitem
240048 239307 │INNERJOINHASHON l_partkey = p_partkey
6001215 6001215 ││└TABLESCAN lineitem
8000 7989 │DISTRIBUTE GATHER
200000 7989 │FILTER p_brand = 'Brand#13'
200000 200000 │TABLESCAN part
1 1 DISTRIBUTE GATHER
1 1 AGGREGATEAVG(null_value) AS avg_21
- 0 FILTER null_20
1 1 PROJECT 0.0 AS null_value, false AS null_20
1 1 SELECT
SQL Server
Estimate Actual Operator 1 1 SEQUENCE 30545 6029 ├─MATERIALISE AS m5
30545 6029 │GROUPBYSORT l_partkey
30545 6029 │SORT l_partkey
30545 6029 │INNERJOINHASHON l_partkey = p_partkey
1017 198 ││└TABLESCAN part WHERE p_container = 'MED CAN' AND p_brand = 'Brand#13'
60012 6029 │TABLESCAN lineitem WHERE BLOOM(l_partkey)
1 1 └─PROJECT Expr1010 / 7.0 AS Expr1011
1 1 PROJECTCASEWHEN Expr1056 = 0 THEN NULL ELSE Expr1057 END AS Expr1010
1 1 AGGREGATECOUNT(*) AS Expr1056, SUM(l_extendedprice) AS Expr1057
30 537 INNERJOINLOOPON l_quantity < 0.2 * Expr1008
30 6029 │└SCAN MATERIALISED m5
1 198 PROJECTCASEWHEN Expr1054 = 0 THEN NULL ELSE Expr1055 / CONVERT_IMPLICIT(decimal(19,0),Expr1054,0) END AS Expr1008
1 198 AGGREGATECOUNT(*) AS Expr1054, SUM(l_quantity) AS Expr1055
30 6029 SCAN MATERIALISED m5
Commentary
The Correlated Subquery
Consider this expression:
l_quantity < (SELECT 0.2 * AVG(l_quantity)
FROM tpch.lineitem
WHERE l_partkey = p_partkey);
The crucial optimisation is for the optimiser to rewrite inoto this:
SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM tpch.lineitem
INNER JOIN tpch.part
ON l_partkey = p_partkey
INNER JOIN (SELECT l_partkey
, 0.2 * AVG(l_quantity) AS avg_quantity
FROM tpch.lineitem
GROUP BY l_partkey) AS decorrelated
ON decorrelated.l_partkey = p_partkey
WHERE p_brand = 'Brand#13'
AND p_container = 'MED CAN'
AND l_quantity < decorrelated.avg_quantityh;
That allows much higher speed hash joins.
Agggressive pushdown
It is possible to push down the highly selective filters on p_brand and p_container into the subquery, further reducing the amount of data processed.