PlannerTPC-H — TPCH-Q05

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
6,373,251
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,608,212
7.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
1,615,345
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
5,953
6K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
6,330,085
6.3M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,491,163
6.5M
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
254,028
254K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,988
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
6,372,861
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,175,985
7.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,995
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,067,194
1.1M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,015
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
6,372,867
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,533,953
6.5M
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
254,041
254K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,988
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -         5  PROJECT SUM(l_extendedprice * (1 - l_discount))
       -         5  SORT SUM(l_extendedprice * (1 - l_discount))
       -         5  GROUP BY HASH n_name AGGREGATE SUM(l_extendedprice * (1 - l_discount))
       -      5953  PROJECT 1 - l_discount, l_extendedprice * (1 - l_discount)
       -      5953  INNER JOIN HASH ON n_regionkey = r_regionkey
       -         1  │└TABLE SCAN region WHERE r_name = 'EUROPE'
       -     30728  INNER JOIN HASH ON s_nationkey = n_nationkey
       -        25  │└TABLE SCAN nation
       -     30728  PROJECT s_nationkey = c_nationkey
       -     30728  INNER JOIN HASH ON (l_suppkey,c_nationkey) = (s_suppkey,s_nationkey)
       -     10000  │└TABLE SCAN supplier
       -    773968  INNER JOIN HASH ON o_custkey = c_custkey
       -    150000  │└TABLE SCAN customer
       -    773968  INNER JOIN HASH ON l_orderkey = o_orderkey
       -    214405  │└TABLE SCAN orders WHERE (o_orderdate >= '1995-01-01') AND (o_orderdate < '1996-01-01')
       -   5998820  TABLE SCAN lineitem
DuckDB
Estimate    Actual  Operator
       -         5  SORT SUM(l_extendedprice * (1 - l_discount))
  272721         5  GROUP BY HASH #0 AGGREGATE SUM(#1)
  287133      6988  PROJECT n_name, l_extendedprice * (1.000 - l_discount)
  287133      6988  INNER JOIN HASH ON l_suppkey = s_suppkeyn_nationkey = s_nationkey
   10000     10000  │└TABLE SCAN supplier
  246102    171079  INNER JOIN HASH ON l_orderkey = o_orderkey
   60992     42896  │└INNER JOIN HASH ON o_custkey = c_custkey
   28846     30048   │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5     TABLE SCAN nation
  150000    108053    TABLE SCAN customer WHERE c_custkey >= 3
  300000    213997   TABLE SCAN orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
 5998820   5998029  TABLE SCAN lineitem WHERE l_suppkey >= 1
SQL Server
Estimate    Actual  Operator
       5         5  SORT Expr1013
       5         5  PROJECT CASE WHEN Expr1019 = 0 THEN NULL ELSE Expr1020 END AS Expr1013
       5         5  GROUP BY HASH AGGREGATE COUNT(Expr1014) AS Expr1019, SUM(Expr1014) AS Expr1020
    7780      6988  INNER JOIN HASH ON s_suppkey = l_suppkey AND n_nationkey = s_nationkey
   10000     10000  │└TABLE SEEK supplier
  194523    171091  INNER JOIN HASH ON o_orderkey = l_orderkey
   42863     42896  │└INNER JOIN HASH ON c_custkey = o_custkey
   30000     30049   │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5    │└INNER JOIN LOOP ON n_regionkey = r_regionkey
       1         5     │└TABLE SEEK region WHERE r_name = 'EUROPE'
      25        25     TABLE SEEK nation
  150000    150000    TABLE SEEK customer
  214316    214017   TABLE SEEK orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
 5998820   5998820  PROJECT l_extendedprice * (1. - l_discount) AS Expr1014
 5998820   5998820  TABLE SCAN lineitem
PostgreSQL
Estimate    Actual  Operator
      25         5  SORT SUM(l_extendedprice * ('1' - l_discount))
      25         5  GROUP BY SORT n_name AGGREGATE SUM(l_extendedprice * ('1' - l_discount))
     125        25  GROUP BY SORT n_name AGGREGATE PARTIALSUM(l_extendedprice * ('1' - l_discount))
    8410      6990  SORT n_name
    8410      6990  INNER JOIN HASH ON (l_suppkey = s_suppkey) AND (n_nationkey = s_nationkey)
   10000     10000  │└TABLE SCAN supplier
  210835    171090  INNER JOIN HASH ON o_custkey = c_custkey
   12500     30049  │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5   │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1    │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25    TABLE SCAN nation
   62500    150000   TABLE SCAN customer
 1054170    856050  INNER JOIN HASH ON l_orderkey = o_orderkey
  340125    214015  │└TABLE SCAN orders WHERE (o_orderdate >= '1995-01-01') AND (o_orderdate < '1996-01-01')
 7498525   5998820  TABLE SCAN lineitem

Commentary



This filter on o_orderdate

This is a quick test for partitioning elimination.

We don't have to join via supplier to harvest the filter - we can go via customer and filter that first. customer is a much larger table than supplier (by 15x) so it is better to apply an aggressive filter on a larger table instead of a smaller one.

Since orders is already filtered, we can also reduce it further by pre-joining to customer, nation and region of to orders before we join to lineitem. If we do this, we still have the filter on c_nationkey = s_nationkey later in the query. Even though we took the filter early on customer we must still apply the filter to supplier later.

By combining the filters r_name = 'ASIA' and o_orderdate >= '1994-01-01' AND o_orderdate < '1995-01-01' in this way, we can greatly reduce the work needed in the query.