Subject Re: Firebird 2.0 Indexing
Author buppcpp
> I know you are talking about millions records... but
>
> Could you show your query ?
>
> what more info could you provide ? (table structure, index
statistics,
> database statistics, cache size, page size, etc.) Maybe something
here
> could enlighten somebody to shed a light on your case.
>
> Distinct needs a sort, as told earlier, in general a sort is
performed
> faster than an indexed scan (to get the entire result set, but
slower to
> get the first row), it is better to find all records in natural
order
> (for example) and sort it then in memory then go the index page
read it
> and than read a data page, then go and read the index page again
and
> then read the same data page again.


OK, here's the info you requested:
(NOTE: This is now on version 1.5.2, version 2.0 wasn't much better.)

TABLE LAYOUT:

SKU VARCHAR(16) (PK)
STORE_NO INTEGER (PK) (this field also has an individual index on
it)
COST DECIMAL(15, 4)
PRICE DECIMAL(15, 4)
.
.
.
(15 other fields are also in this table)

Primary Key Stats (PK_SKU_STORE). 0.0000004474044
Store_No Key Stats (IDX_STORE). 0.5000000000000

Table has 2,235,114 records.

Cache Size = 10240
Page Size = 8192
DB Stats:
Database header page information:
Flags 0
Checksum 12345
Generation 27646
Page size 8192
ODS version 10.1
Oldest transaction 27641
Oldest active 27642
Oldest snapshot 27642
Next transaction 27643
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date May 21, 2005 18:28:11
Attributes

Queries:
1. SELECT store_no FROM mytable GROUP BY 1;
(PLAN MYTABLE IDX_STORE)
Prepare: 20 ms Execute: 2 min 37 s 657 ms Fetch: 2 min 30 s 617 ms


2. SELECT DISTINCT store_no FROM mytable;
PLAN SORT ((MYTABLE NATURAL))
Prepare: 30 ms Execute: 1 min 24 s 141 ms Fetch: 180 ms
(NOTE: The above query was ran right after the first query, so this
result is with some caching involved. It much worst with a fresh
start.)

The database I want to convert from performs these same queries in
under 15 secs, using the EXACT same data, table and index layout on
the same machine.

I hope someone can now look into this and speed DISTINCT and GROUP
BY up.

Thanks