PlannerTPC-H — TPCH-Q02

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
1,220,854
1.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
2,407,934
2.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
516
516
Rank
Estimation Error
Est Err
Row Operations
Ops
1,425,587
1.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
120,875
121K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
1,197,599
1.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,201,489
1.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
346
346
Rank
Estimation Error
Est Err
Row Operations
Ops
12,706
13K
Rank
Estimation Error
Est Err
Row Operations
Ops
473
473
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
614,602
615K
Rank
Estimation Error
Est Err
Row Operations
Ops
621,442
621K
Rank
Estimation Error
Est Err
Row Operations
Ops
346
346
Rank
Estimation Error
Est Err
Row Operations
Ops
133,172
133K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,145
1.1K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
610,825
611K
Rank
Estimation Error
Est Err
Row Operations
Ops
614,728
615K
Rank
Estimation Error
Est Err
Row Operations
Ops
819
819
Rank
Estimation Error
Est Err
Row Operations
Ops
7,565
7.6K
Rank
Estimation Error
Est Err
Row Operations
Ops
473
473
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -       516  SORT s_acctbal, n_name, s_name, p_partkey
       -       516  FILTER (p_size = 25) AND (p_type LIKE '%BRASS') AND (r_name = 'EUROPE') AND (ps_supplycost = )
       -       516  LEFT OUTER JOIN HASH ON p_partkey = p_partkey
       -     89290  │└PROJECT MIN(ps_supplycost)
       -     89290   GROUP BY HASH p_partkey AGGREGATE MIN(ps_supplycost)
       -    120875   PROJECT ps_partkey
       -    120875   INNER JOIN HASH ON n_regionkey = r_regionkey
       -         1   │└TABLE SCAN region WHERE r_name = 'EUROPE'
       -    599941   INNER JOIN HASH ON s_nationkey = n_nationkey
       -        25   │└TABLE SCAN nation
       -    599941   INNER JOIN HASH ON ps_suppkey = s_suppkey
       -     10000   │└TABLE SCAN supplier
       -    600000   TABLE SCAN partsupp WHERE ps_partkey = ps_partkey
       -       516  INNER JOIN HASH ON n_regionkey = r_regionkey
       -         1  │└TABLE SCAN region WHERE r_name = 'EUROPE'
       -      2512  INNER JOIN HASH ON s_nationkey = n_nationkey
       -        25  │└TABLE SCAN nation
       -      2512  INNER JOIN HASH ON ps_suppkey = s_suppkey
       -     10000  │└TABLE SCAN supplier
       -      2512  INNER JOIN HASH ON ps_partkey = p_partkey
       -       802  │└TABLE SCAN part WHERE (p_size = 25) AND (p_type LIKE '%BRASS')
       -    600000  TABLE SCAN partsupp
DuckDB
Estimate    Actual  Operator
       -       346  SORT s_acctbal, n_name, s_name, p_partkey
    2981       346  PROJECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
    2981       346  FILTER ps_supplycost = SUBQUERY
       0         0  RIGHT SEMI INNER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
    2910       346  │└GROUP BY HASH #0 AGGREGATE 
       0       473   RIGHT OUTER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
       -         0   │└SCAN EMPTY
    3151       346   PROJECT min_cost, p_partkey
    3151       346   GROUP BY HASH #0 AGGREGATE MIN(#1)
    6302       473   PROJECT p_partkey, ps_supplycost
    6302       473   INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      2033   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5     TABLE SCAN nation
   10000      7192    TABLE SCAN supplier
   28228      1383   INNER JOIN HASH ON ps_partkey = p_partkey
    2910         0   │└SCAN MATERIALISED
  600000    591202   TABLE SCAN partsupp WHERE ps_suppkey >= 1
    2981       473  INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      2033  │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5   │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1    │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5    TABLE SCAN nation
   10000      7192   TABLE SCAN supplier
   13354      2362  INNER JOIN HASH ON ps_partkey = p_partkey
    4445       591  │└FILTER p_partkey <= 150000
    4445       799   TABLE SCAN part WHERE p_size = 25 AND suffix(p_type,'BRASS')
  600000    591202  TABLE SCAN partsupp WHERE ps_suppkey >= 1
SQL Server
Estimate    Actual  Operator
     222       346  SORT s_acctbal, n_name, s_name, ps_partkey
     710       346  LIMIT 0
    1647       473  GROUP BY HASH  AGGREGATE 
    1647       473  SORT ps_partkey, ps_supplycost
    1647       473  INNER JOIN HASH ON r_regionkey = n_regionkey
       1         1  │└TABLE SEEK region WHERE r_name = 'EUROPE'
    8226      2364  INNER JOIN HASH ON n_nationkey = s_nationkey
      25        25  │└TABLE SEEK nation
    8226      2364  INNER JOIN HASH ON s_suppkey = ps_suppkey
    8227      2364  │└INNER JOIN HASH ON ps_partkey = p_partkey
    1746       799   │└TABLE SEEK part WHERE p_size = 25 AND p_type LIKE '%BRASS'
  600000    600000   TABLE SCAN partsupp WHERE ps_partkey IS NOT NULL AND ps_supplycost IS NOT NULL
   10000     10000  TABLE SEEK supplier
PostgreSQL
Estimate    Actual  Operator
       1       346  SORT s_acctbal, n_name, s_name, p_partkey
       1       346  INNER JOIN HASH ON (p_partkey = ps_partkey) AND ((SubPlan 1) = ps_supplycost)
    1145      1145  │└GROUP BY SIMPLE AGGREGATE MIN(ps_supplycost)
    1145      1145   INNER JOIN LOOP ON r_regionkey = n_regionkey AND (r_regionkey = n_regionkey)
       1         1   │└TABLE SCAN r_min WHERE r_name = 'EUROPE'
    4580      3435   INNER JOIN LOOP ON n_nationkey = s_nationkey AND (n_nationkey = s_nationkey)
      25        25   │└TABLE SCAN n_min
    4580      3435   INNER JOIN LOOP ON s_suppkey = ps_suppkey
       4         3   │└TABLE SCAN ps_min WHERE ps_partkey = p_partkey
    3748      3748   TABLE SEEK s_min
  119940    121974   INNER JOIN HASH ON ps_suppkey = s_suppkey
    2000      2033   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25     TABLE SCAN nation
   10000     10000    TABLE SCAN supplier
  600000    600000   TABLE SCAN partsupp
     817       799  TABLE SCAN part WHERE (p_type LIKE '%BRASS') AND (p_size = 25)

Commentary



The correlated Subquery

The fascinating bit about this query is the nested subquery: SELECT MIN(ps_supplycost) FROM tpch.partsupp....

That subquery is again correlated with the outer query via this filter: ps_partkey = p_partkey. But this time, the correlation is more complex: we must execute several joins to find the value that the outer ps_supplycost is looking for.

Consider this expression:

ps_supplycost = (SELECT MIN(ps_supplycost)
                       FROM tpch.partsupp AS ps_min
                       INNER JOIN tpch.supplier AS s_min
                           ON ps_suppkey = s_suppkey
                       INNER JOIN tpch.nation AS n_min
                           ON s_nationkey = n_nationkey
                       INNER JOIN region AS r_min
                           ON n_regionkey = r_regionkey
                       WHERE ps_partkey = p_partkey
                         AND r_name = 'EUROPE')

The key here is decorrelate this query into a join to this decorrelated construct

SELECT ps_partkey, 
       MIN(ps_supplycost) AS min_ps_supplycost
FROM tpch.partsupp AS ps_min
INNER JOIN tpch.supplier AS s_min
    ON ps_suppkey = s_suppkey
INNER JOIN tpch.nation AS n_min
    ON s_nationkey = n_nationkey
INNER JOIN region AS r_min
   ON n_regionkey = r_regionkey
WHERE ps_partkey = p_partkey
    AND r_name = 'EUROPE'
GROUP BY ps_partkey;

This allows us to harvest the filter on r_name and apply it transitively to the join in the outer query.