Estimate Actual Operator 10 10 LIMIT 0
10 10 SORT Expr1009, o_orderdate
222132 11281 INNERJOINHASHON l_orderkey = o_orderkey
228150 145216 │└INNERJOINHASHON o_custkey = c_custkey
29868 29949 ││└TABLESCAN customer WHERE c_mktsegment = 'MACHINERY'
72698 145216 │TABLESCAN orders WHERE o_orderdate < '1995-03-15' AND BLOOM(o_custkey)
124448 11281 AGGREGATESUM(Expr1010) AS Expr1009 GROUPBYHASH l_orderkey
323565 29857 PROJECT l_extendedprice * (1. - l_discount) AS Expr1010
323565 29857 TABLESCAN lineitem WHERE l_shipdate > '1995-03-15' AND BLOOM(l_orderkey)
Commentary
Optimal Join Order
Optimisers only seeking left deep trees will consume signficant memory while executing the query.
The key optimisation in join ordering is to create a bushy join on customer and orders before joining to lineitem. This allows both l_shipdate > '1995-03-15' and o_orderdate < '1995-03-15' to be be pre-filtered before the join to lineitem greatly reducing the amount of data to be processed.
Optimisers who understand bloom filters will be able to harvest further filters by combining the customer and orders filters before accessing lineitem.
Heap Sort
The LIMIT 10 in the query is best optimised with heap sort nodes. This is typically done with a special sort operator that can cut the stream early. Good optimisers and execution engines find this optimisation and reduce the amount of data to be sorted before emitting the top 10.
In the dbprove tool, this shows up a the sort not only emitting a few rows.