PlannerTPC-H — TPCH-Q17

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
11,997,873
12M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,005,807
6M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
206,987
207K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,005,807
6M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
11,841,430
12M
Rank
Estimation Error
Est Err
Row Operations
Ops
11,853,388
12M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
18,442
18K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,544
6.5K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
6,187,501
6.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
5,998,820
6M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
6,080
6.1K
Rank
Estimation Error
Est Err
Row Operations
Ops
188,944
189K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
12,363
12K
Rank
Estimation Error
Est Err
Row Operations
Ops
928
928
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
6,544
6.5K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,544
6.5K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -         1  PROJECT SUM(l_extendedprice) / 7., SUM(l_extendedprice) / 7.
       -         1  GROUP BY HASH AGGREGATE SUM(l_extendedprice)
       -      6987  FILTER (p_brand = 'Brand#13') AND (p_container = 'MED CAN') AND (l_quantity < )
       -      6987  LEFT OUTER JOIN HASH ON p_partkey = p_partkey
       -    200000  │└PROJECT 0.2 * AVG(l_quantity)
       -    200000   GROUP BY HASH p_partkey AGGREGATE AVG(l_quantity)
       -   5998820   PROJECT l_partkey
       -   5998820   TABLE SCAN lineitem WHERE TRUE
       -      6987  INNER JOIN HASH ON l_partkey = p_partkey
       -       233  │└TABLE SCAN part WHERE (p_brand = 'Brand#13') AND (p_container = 'MED CAN')
       -   5998820  TABLE SCAN lineitem
DuckDB
Estimate    Actual  Operator
       1         1  PROJECT avg_yearly
       -         1  GROUP BY SIMPLE AGGREGATE SUM(#0)
  162389       464  PROJECT l_extendedprice
  162389       464  FILTER CAST(l_quantity AS DOUBLE) < SUBQUERY
       0         0  RIGHT SEMI INNER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
  109375       202  │└GROUP BY HASH #0 AGGREGATE 
       0      6080   RIGHT OUTER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
       -         0   │└SCAN EMPTY
 1717800       202   PROJECT 0.2 * AVG(l_quantity), p_partkey
 1717800       202   GROUP BY HASH #0 AGGREGATE AVG(#1)
 3435601      6080   PROJECT p_partkey, l_quantity
 3435601      6080   INNER JOIN HASH ON l_partkey = p_partkey
  109375         0   │└SCAN MATERIALISED
 5998820   5920614   TABLE SCAN lineitem
  162389      6080  INNER JOIN HASH ON l_partkey = p_partkey
    5406       202  │└TABLE SCAN part WHERE p_brand = 'Brand#13' AND p_container = 'MED CAN'
 5998820   5920614  TABLE SCAN lineitem
SQL Server
Estimate    Actual  Operator
       1         ∞  PROJECT Expr1009 / 7.0 AS Expr1010
       1         1  PROJECT CASE WHEN Expr1022 = 0 THEN NULL ELSE Expr1023 END AS Expr1009
       1         1  GROUP BY HASH AGGREGATE COUNT(l_extendedprice) AS Expr1022, SUM(l_extendedprice) AS Expr1023
 5998820       464  INNER JOIN LOOP ON l_quantity < 0.2 * Expr1007
      30       464  │└INNER JOIN LOOP ON 
      30      6080   │└TABLE SCAN lineitem
       1       202   PROJECT CASE WHEN Expr1020 = 0 THEN NULL ELSE Expr1021 / CONVERT_IMPLICIT(decimal(19,0),Expr1020,0) END AS Expr1007
       1       202   GROUP BY HASH AGGREGATE COUNT(l_quantity) AS Expr1020, SUM(l_quantity) AS Expr1021
      30      6080   TABLE SCAN lineitem
    1007       203  TABLE SCAN lineitem
PostgreSQL
Estimate    Actual  Operator
       1         1  GROUP BY SIMPLE AGGREGATE SUM(l_extendedprice) / 7.0
    1930       464  INNER JOIN HASH ON l_partkey = p_partkey AND (l_quantity < (SubPlan 1))
    6080      6080  │└GROUP BY SIMPLE AGGREGATE 0.2 * AVG(l_quantity)
  188480    188480   TABLE SEEK li
     240       201   TABLE SCAN part WHERE (p_brand = 'Brand#13') AND (p_container = 'MED CAN')
 5998820   5998820  TABLE SCAN lineitem

Commentary



The Correlated Subquery

Consider this expression:

l_quantity < (SELECT 0.2 * AVG(l_quantity)
                    FROM tpch.lineitem
                    WHERE l_partkey = p_partkey);

The crucial optimisation is for the optimiser to rewrite inoto this:

SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM tpch.lineitem
INNER JOIN tpch.part
    ON l_partkey = p_partkey
INNER JOIN (SELECT l_partkey
                 , 0.2 * AVG(l_quantity) AS avg_quantity
            FROM tpch.lineitem
            GROUP BY l_partkey) AS decorrelated
    ON decorrelated.l_partkey = p_partkey
WHERE p_brand = 'Brand#13'
  AND p_container = 'MED CAN'
  AND l_quantity < decorrelated.avg_quantityh;

That allows much higher speed hash joins.

Agggressive pushdown

It is possible to push down the highly selective filters on p_brand and p_container into the subquery, further reducing the amount of data processed.