Subject | Re: [firebird-support] How does the optimizer of Firebird determine join order? |
---|---|
Author | Ann Harrison |
Post date | 2014-05-15T01:35:09Z |
On Tue, May 13, 2014 at 10:37 PM, Zhan Li zhanli89@... [firebird-support] <firebird-support@yahoogroups.com> wrote:
Hi All,I've been studying the optimizer of firebird for a while, and I was wondering how the optimizer chooses join order? As most query optimizers determine join order via a dynamic programming algorithm pioneered by IBM's System R database project, does the optimizer of firebird use similar strategy?I can't speak to the details of the current optimizer, but the original design of the optimizer was developed concurrently with IBM's System R, so neither derived from the other. Firebird has a cost-based optimizer which seeks to minimize the number rows returned as the intermediate product. The optimizer focuses on inner joins. Outer joins have an implicit order - the side that is preserved (e.g. the first named term in a left outer join) must precede the optional side. Yes, if you have something like:from customers c,left outer join addresses a on (a.cust_id = c.cust_id)left outer join invoices i on (i.cust_id = c.cust_id)you would necessarily look up customers first and could chose to look at addresses or invoices in either order. But that has very little to do with the overall cost.These examples are simple - the actual cases are joins of many tables.Before it begins optimization, Firebird distributes equalities.from orders oinner join customers c on (o.cust_id = c.cust_id)where c.cust_id = 12345is transformed into:from orders oinner join customers c on (o.cust_id = c.cust_id)where c.cust_id = 12345 and o.cust_id = 12435That allows the optimizer to choose between starting with customers (a single valued lookup) and starting with orders (potentially multi-valued, but possibly desirable, depending on what else was included in the query.The first design of the optimizer considered all possible orders for inner joins, estimating the cost based on the selectivity of the index and the nature of the conjunct. An equality comparison where one side is identified by a comparison between a constant and a unique key and the other on an equality between that unique key and a unique key in the second table requires looking at two records (and a few index entries). For example:from orders oinner join customers c on (o.cust_id = c.cust_id)where o.order_id = 12345Each order has only one value for cust_id, so the optimizer chooses to look up the order by its unique key, then use the cust_id in the order to look up the customer by its unique key.from orders oinner join customers c on (o.cust_id = c.cust_id)where c.cust_state = 'RI'Here the optimizer considers the selectivity for the index on "state = 'RI'" and compares it with the cardinalities of orders and customers. If the company doesn't do much business, but all of it is in Rhode Island, the cost may be lower if it reads all the orders, does a single row look-up on customers, and throws out those that aren't in Rhode Island.Assuming that the index selectivity was reasonably correct, and the index values were distributed evenly (i.e. no cases where 90% of the entries had one value and the other 10% were random), eventually that algorithm got a good join order. At the time, Oracle used semantic ordering: tables were evaluated in the order they appeared in the query. However, the full cross product of joins of more than9 or so, the full cross product lead to problems, "the optimization took twelve hours. Running the query too .05 milliseconds." Good answer, bad process.Interbase (as it was then) adopted a strategy of weighting results. The first ordering it considered was analyzed in full and the full cost of the intermediate product was saved. The subsequent orderings compared themselves with the first as each table was introduced and if at any step the cost was greater than the first, the ordering was discarded.I think there was a subsequent reorganization that considers the longest possible chain of equalities first.A goal, at least for Interbase and early versions of Firebird, was to emphasize getting good results for good queries and not chasing silly queries - like returning no results from the compiler if one of the conditions was "and 1 = 2".Good luck,Annf