Subject OLAP tuning in Firebird 2.1
Author Richard Wesley
Hi All -

We are trying to understand how Firebird optimises certain kinds of
OLAP queries. These questions are from a meeting and I am passing
them along for any comments/feedback anyone might have.

Note that we are NOT trying to optimise a single database; rather we
are trying to determine rules for how to create databases for
particular classes of user data. These data sets may be narrow (a few
columns) or wide (maybe up to 50 columns) and vary in length from a
few rows up to tens of millions. We can vary the algorithm based on
these parameters (and we can even get into row width in bytes instead
of columns if necessary).

Hopefully these rules (and this discussion) will be of wider interest
than just us: I know that Thomas Steinmaurer is writing a series of
blog articles on porting TCP-H to Firebird and this is in a similar
vein.

First of all, the table in question

SQL> show table "TableauExtract";
Account ID VARCHAR(18) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Account TAC VARCHAR(10) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Account TAE VARCHAR(20) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Account THR VARCHAR(5) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
AccountParent ID VARCHAR(18) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
AccountParent VARCHAR(100) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Account VARCHAR(100) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Addr City VARCHAR(40) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Addr Country VARCHAR(40) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Addr State VARCHAR(20) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Email VARCHAR(80) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact ID VARCHAR(18) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Name VARCHAR(128) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Phone VARCHAR(40) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Status VARCHAR(20) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Type II VARCHAR(20) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Type VARCHAR(20) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact Unread by Owner VARCHAR(5) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
LeadContact VARCHAR(20) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
MKT LeadContact Source Detail VARCHAR(100) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
MKT LeadContact Source VARCHAR(40) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Number of Records INTEGER Not Null
Object Amount NUMERIC(18, 2) Nullable
Object DateCompleted TIMESTAMP Nullable
Object DateCreated TIMESTAMP Not Null
Object Detail VARCHAR(1024) CHARACTER SET UTF8
Nullable
COLLATE UNICODE_CI
Object ID VARCHAR(18) CHARACTER SET UTF8 Not Null
COLLATE UNICODE_CI
Object Name VARCHAR(186) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Object Priority VARCHAR(80) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Object Source VARCHAR(11) CHARACTER SET UTF8 Not Null
COLLATE UNICODE_CI
Object Status VARCHAR(80) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Object Type Outbound VARCHAR(5) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Object Type VARCHAR(80) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Opportunity Contact Primary VARCHAR(5) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Opportunity Contact Role VARCHAR(40) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Owner Assigned VARCHAR(128) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Owner CreatedBy VARCHAR(128) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Salesperson Account VARCHAR(30) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Salesperson LeadContact VARCHAR(30) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Salesperson Territory VARCHAR(30) CHARACTER SET UTF8 Nullable
COLLATE UNICODE_CI
Today TIMESTAMP Not Null

SQL> show indexes;
_tidx_128_10a INDEX ON TableauExtract(LeadContact Addr State)
_tidx_128_11a INDEX ON TableauExtract(LeadContact Email)
_tidx_128_12a INDEX ON TableauExtract(LeadContact ID)
_tidx_128_13a INDEX ON TableauExtract(LeadContact Name)
_tidx_128_14a INDEX ON TableauExtract(LeadContact Phone)
_tidx_128_15a INDEX ON TableauExtract(LeadContact Status)
_tidx_128_16a INDEX ON TableauExtract(LeadContact Type II)
_tidx_128_17a INDEX ON TableauExtract(LeadContact Type)
_tidx_128_18a INDEX ON TableauExtract(LeadContact Unread by Owner)
_tidx_128_19a INDEX ON TableauExtract(LeadContact)
_tidx_128_1a INDEX ON TableauExtract(Account ID)
_tidx_128_20a INDEX ON TableauExtract(MKT LeadContact Source Detail)
_tidx_128_21a INDEX ON TableauExtract(MKT LeadContact Source)
_tidx_128_24a INDEX ON TableauExtract(Object DateCompleted)
_tidx_128_25a INDEX ON TableauExtract(Object DateCreated)
_tidx_128_27a INDEX ON TableauExtract(Object ID)
_tidx_128_29a INDEX ON TableauExtract(Object Priority)
_tidx_128_2a INDEX ON TableauExtract(Account TAC)
_tidx_128_30a INDEX ON TableauExtract(Object Source)
_tidx_128_31a INDEX ON TableauExtract(Object Status)
_tidx_128_32a INDEX ON TableauExtract(Object Type Outbound)
_tidx_128_33a INDEX ON TableauExtract(Object Type)
_tidx_128_34a INDEX ON TableauExtract(Opportunity Contact Primary)
_tidx_128_35a INDEX ON TableauExtract(Opportunity Contact Role)
_tidx_128_36a INDEX ON TableauExtract(Owner Assigned)
_tidx_128_37a INDEX ON TableauExtract(Owner CreatedBy)
_tidx_128_38a INDEX ON TableauExtract(Salesperson Account)
_tidx_128_39a INDEX ON TableauExtract(Salesperson LeadContact)
_tidx_128_3a INDEX ON TableauExtract(Account TAE)
_tidx_128_40a INDEX ON TableauExtract(Salesperson Territory)
_tidx_128_41a INDEX ON TableauExtract(Today)
_tidx_128_4a INDEX ON TableauExtract(Account THR)
_tidx_128_5a INDEX ON TableauExtract(AccountParent ID)
_tidx_128_6a INDEX ON TableauExtract(AccountParent)
_tidx_128_7a INDEX ON TableauExtract(Account)
_tidx_128_8a INDEX ON TableauExtract(LeadContact Addr City)
_tidx_128_9a INDEX ON TableauExtract(LeadContact Addr Country)
SQL>

So, here is the first question:

-----
When I connect to the extract and drag [Object Source] to rows, we run:

Query 1: Trying to get the domain of [Object Source] by GROUP BY
SELECT ("TableauExtract"."Object Source") AS "none:Object Source:nk"
FROM "TableauExtract"
GROUP BY 1
</QUERY>
<PLAN>
PLAN (TableauExtract ORDER _tidx_128_30a)
</PLAN>
[Time] Running the query took 4.2705 sec.
[Time] Getting the records took 0.0002 sec.

It takes roughly 4 seconds to table scan this data source so it
appears that this is doing a table scan based on time versus
retrieving the unique keys from the index. That makes sense to me when
we include a measure, such as in the following, since computing the
measure will require touching all the records. But in the above, we
are only operating on the key of the index – why touch the data pages?

Query 2: Computing a measure – we need to table scan.
SELECT ("TableauExtract"."Object Source") AS "none:Object Source:nk",
(SUM("TableauExtract"."Object Amount")) AS "sum:Object Amount:qk"
FROM "TableauExtract"
GROUP BY 1
</QUERY>
<PLAN>
PLAN (TableauExtract ORDER _tidx_128_30a)
</PLAN>
[Time] Running the query took 4.3217 sec.
[Time] Getting the records took 0.0002 sec.

I thought perhaps the GROUP BY confused it. But here is another
example just doing a “SELECT DISTINCT”:

Query 3: Trying to get the domain by SELECT DISTINCT
SELECT DISTINCT "TableauExtract"."Object Source" AS "Object Source"
FROM "TableauExtract"
ORDER BY 1 ASC
</QUERY>
<PLAN>
PLAN SORT ((TableauExtract NATURAL))
</PLAN>
[Time] Running the query took 4.2456 sec.
[Time] Getting the records took 0.0002 sec.

----

So the bottom line is: What is the fastest way to determine the
members of a column in a local database? Is there any way (e.g. by
fiddling with transaction settings) to avoid a table scan?
Denormalisation works for this query, but that makes table scans
involving a few joins quite a bit slower - and as this is what we
usually do, we are not sure which way to jump.

TIA
________________________________________________________
Richard Wesley Senior Software Developer Tableau
Software
Visit: http://www.trytableau.com/now.html