Subject Re: [firebird-support] Reads inconsistency between FB TraceManager and IB Planalyser
Author Ann Harrison

On Aug 11, 2015, at 1:30 PM, 'Louis van Alphen' louis@... [firebird-support] <firebird-support@yahoogroups.com> wrote:

Thanks Ann, so maybe the 2 tools' metric are different. So maybe FBTM
includes all (low level) reads and IP only user table reads.

Or maybe one runs some queries before the one it measures so all the system table queries are compiled, optimized, and executed, filling the cache with good stuff.

Also, if I remove the order by, IP reports a drastic reduction is
execution time i.e. down to 6msec. Does the ORDER BY on the PK make such a
difference?


Unh, maybe.   Between the list, my mail system and yours, I've lost most of the formatting and can't look at the original while typing (grrr....)   You seem to use left outer joins and that can be a problem because (in general) the order in which outer joins are declared in the query is the order in which they have to be accessed in the query plan.   

For example, a three way join of students, registrations, and courses with inner joins can be run starting with any of the three tables and get the same results.  

Select s.student_name, c.course_name 
    from students s 
         join registrations r on r.student_id = s.student_id
         join courses c on c.course_id = r.course_id

The optimizer can choose to start with registrations and lookup students and courses by primary key, or students and look up a registration by student_id and a course by the course_id in registrations, or courses -> registrations -> students.  Or sort the students and registrations and merge them, looking up courses from the matching pairs.  Or other things, probably.

However, this query can be executed in only one way: students -> registrations -> courses

Select s.student_name, c.course_name 
    from students s 
        left outer join registrations r on r.student_id = s.student_id
         left outer join courses c on c.course_id = r.course_id

If that way isn't optimal, too bad.  The query must be executed that way or you'll miss all the students who aren't registered for courses and all the registrations that don't correspond to any courses.

It may be that adding the sort, you've convinced Firebird that it has to read and sort all the records in that table, then start from the sorted result.  Try using one of the various syntaxes that tells it to return only the first record found that matches the criteria.  That may cause Firebird to walk the index ... read records in index order ...  rather than reading the table and sorting it.

Now that probably sounds really dumb, but in fact, walking a table in index order is generally a lot slower than reading it in physical order and sorting the result.  Yes, sort is n(log n), but there's a big K applied that is the cost of a random record read.  So, if you expect a lot of results, sort first and ask for matches later....   If not, use 'Select FIRST <n>' or 'LIMIT <n>' or whatever the standards committee chose to bless ... maybe  'FETCH FIRST <n>'.  Any of those hints will convince Firebird to walk the table in index order.

This query is likely to read and sort the students first, then start looking for the rest of the data unless there's a good index on student_name.  

Select s.student_name, c.course_name 
    from students s 
        left outer join registrations r on r.student_id = s.student_id
         left outer join courses c on c.course_id = r.course_id
     where s.student_name = 'Dinah MacQuarie'
     order by s.student_id

This may give a faster result in the absence of an student_name index.  It may not, but do consider trying it when playing with query forms...

Select first 30 s.student_name, c.course_name 
    from students s 
        left outer join registrations r on r.student_id = s.student_id
         left outer join courses c on c.course_id = r.course_id
     where s.student_name = 'Dinah MacQuarie'
     order by s.student_id

It's worth noting that Firebird does not store records in primay key order unless the records happen to arrive in primary key order.  Even when records are created in primary key order - as with a generated key - deleted records may be replaced with records created much later.   Lots of databases cluster records on primary key, making it unnecessary to sort when reading the primary key.   That has other costs, particularly with natural keys...  This post is already way to long to take on that issue.


Now, just to complicate things, it's possible to turn left outer joins to inner joins - at least if you're a person.   Maybe the Firebird optimizer knows about this, and maybe it doesn't.  Certainly in the previous century one database developer - initials JAS - thought that if you were dumb enough to use outer joins when you shouldn't, the optimizer shouldn't reward your stupidity.   He's become more benevolent in his old age.  Anyway, here's the case:

Select s.student_name, c.course_name 
    from students s 
         join registrations r on r.student_id = s.student_id
         join courses c on c.course_id = r.course_id
     where c.department_id = (select d.dpeartment_id from departments d
                                                    where d.department_name = 'Computer Science')

That means you're going to throw out all the students without registrations, all of the registrations without courses, and all of the courses that aren't in Computer Science.  So it devolves back to inner joins and can be optimized in the more or less rational order courses -> registrations -> students.  Unless you've got a university with tens of students and hundreds of CS courses...

So to add to my confusion, I don't know if your outer joins have been corrected to be inner joins based on the criteria in the query on the view.

My other question is why is there 'WARNING' against some of the indexes? Are
the stats out of spec?

I've deleted the index stats because the combination of mail programs has left them unintelligible garbage - look back in this thread for a formatted version.   Basically the only index that doesn't have a warning is unique and has a selectivity of 3.nn e07.  There's another that's 3.nn e06, which is pretty good - but IIRC (did I mention hating my mail program?) has a very long duplicate chain.  Look a the numbers of unique values, the numbers of duplicates in general, and the largest number of duplicates for any one value.

Again, this problem is complicated by reality.  

Lets assume your students can be alive or dead.  Most of the students are alive.  An index on student.death_state would be useless for looking for live students, but very useful for looking for dead students.  If you spend a lot of time looking for the courses taken by dead students, the index is valuable.  If not, not.  So an index with lots of duplicates on one value is useful, if you're looking for other values.   

Similarly, an index with a fair number of duplicates on each value can be useful - think of student birth dates.  Most of them are going to be between 1992 and 1997 so a large university will have lots of duplicates, but the index is great for finding the young and the old, and OK for figuring out who can drink legally in Wyoming.

The index analyzer has no clue about your patterns of data usage.  You do.  Look at the statistics and decide if there are indexes that don't generally eliminate 90% of the records in the table when used for an equality retrieval.


I am not understanding what is going on.

I doubt that I have helped.  However, let me summarize.  

Don't sort things if you don't care about the order of retrieval.  If you do, and you can anticipate the number of rows that should be returned within an order of magnitude, limit the number of records returned.

Don't use an outer join unless you need to preserve unmatched records.

Do consider the distribution of duplicates in your indexes vs. the criteria in critical queries.

Good luck,


Ann