firebird-support - RE: [firebird-support] Re: Why it's soo slow ? it's just a very simple select ...

Subject	RE: [firebird-support] Re: Why it's soo slow ? it's just a very simple select ...
Author	David Hollings
Post date	2012-03-09T15:01:29Z

Hi All,

Been watching recently and decided to have a quick gander hoping to
reproduce the OPs original issues with slower selects from a table
containing a varchar(10000) versus a table containing no varchar(10000)
field.

Test rig:

Centos 5.3 32 bit
Firebird 2.5.0 Classic Server 32 bit

Procedure:

- I created a database containing two tables - DESC_EMPTY (IDOBJ
VARCHAR(20)) and DESC_VARCHAR (IDOBJ VARCHAR(20), COMMENT VARCHAR(10000)).
- Populated DESC_VARCHAR with 50 million test records - IDOBJ populated with
GEN_UUID() and COMMENT populated with random concatenated GEN_UUID()s.
- Populated DESC_EMPTY using "insert into desc_empty select idobj from
desc_varchar;"
- Create and index on each tables idobj column.
- Perform a backup and restore.

Results:

[root@ppdev prep]# ls -lh
-rw-rw---- 1 root root 22G Mar 9 15:21 test2.gdb

----------------------------------------------------------
-A quick gstat -r to see what's going on
----------------------------------------------------------

[root@ppdev prep]# gstat -r test2.gdb

Database "test2.gdb"
Database header page information:
Flags 0
Checksum 12345
Generation 16
Page size 8192
ODS version 11.2
Oldest transaction 1
Oldest active 2
Oldest snapshot 2
Next transaction 8
Bumped transaction 1
Sequence number 0
Next attachment ID 2
Implementation ID 19
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Mar 9, 2012 14:20:12
Attributes force write

Variable header data:
Sweep interval: 20000
*END*

Database file sequence:
File test2.gdb is the only file

Analyzing database pages ...
DESC_EMPTY (129)
Primary pointer page: 153, Index root page: 154
Average record length: 25.00, total records: 50000000
Average version length: 0.00, total versions: 0, max versions: 0
Data pages: 403226, data page slots: 403226, average fill: 64%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 403225
80 - 99% = 0

Index DESC_EMPTY_IDX (0)
Depth: 4, leaf buckets: 123965, nodes: 50000000
Average data length: 13.32, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 123964

DESC_VARCHAR (128)
Primary pointer page: 151, Index root page: 152
Average record length: 312.99, total records: 50000000
Average version length: 0.00, total versions: 0, max versions: 0
Data pages: 2173914, data page slots: 2173914, average fill: 93%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 2173913

Index DESC_VARCHAR_IDX (0)
Depth: 4, leaf buckets: 125733, nodes: 50000000
Average data length: 13.32, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 125733

----------------------------------------------------------
- Perform each test select twice - initially this will populate the OS
cache. The second run will eliminate disk IO - we're interested in the page
reads regardless.
----------------------------------------------------------

[root@ppdev prep]# isql test2.gdb
Database: test2.gdb
SQL> show tables;
DESC_EMPTY DESC_VARCHAR

SQL> show table desc_varchar;
IDOBJ VARCHAR(20) Nullable
COMMENT VARCHAR(10000) Nullable
SQL> show table desc_empty;
IDOBJ VARCHAR(20) Nullable
SQL> set stats on;
SQL> set plan on;
SQL> select idobj from desc_varchar where idobj = 'MONKEY';

PLAN (DESC_VARCHAR INDEX (DESC_VARCHAR_IDX))
Current memory = 981988
Delta memory = 71796
Max memory = 1067700
Elapsed time= 8.56 sec
Cpu = 0.00 sec
Buffers = 75
Reads = 1167
Writes = 0
Fetches = 1299
SQL> select idobj from desc_empty where idobj = 'MONKEY';

PLAN (DESC_EMPTY INDEX (DESC_EMPTY_IDX))
Current memory = 983476
Delta memory = 1488
Max memory = 1116260
Elapsed time= 1.89 sec
Cpu = 0.00 sec
Buffers = 75
Reads = 238
Writes = 0
Fetches = 273
SQL> select idobj from desc_varchar where idobj = 'MONKEY';

PLAN (DESC_VARCHAR INDEX (DESC_VARCHAR_IDX))
Current memory = 983480
Delta memory = 4
Max memory = 1116260
Elapsed time= 0.01 sec
Cpu = 0.00 sec
Buffers = 75
Reads = 1143
Writes = 0
Fetches = 1143
SQL> select idobj from desc_empty where idobj = 'MONKEY';

PLAN (DESC_EMPTY INDEX (DESC_EMPTY_IDX))
Current memory = 983476
Delta memory = -4
Max memory = 1116260
Elapsed time= 0.00 sec
Cpu = 0.00 sec
Buffers = 75
Reads = 221
Writes = 0
Fetches = 221
SQL> show indexes;
----------------------------------------------------------
- show indices... just to be safe ;)
----------------------------------------------------------
DESC_EMPTY_IDX INDEX ON DESC_EMPTY(IDOBJ)
DESC_VARCHAR_IDX INDEX ON DESC_VARCHAR(IDOBJ)
SQL>

----------------------------------------------------------

Conclusion:

Baffled why a select (which could never possibly return a result) reads
~1150 pages from DESC_VARCHAR but only ~230 pages from DESC_EMPTY given that
the two indices on IDOBJ columns are virtually identical (identical data,
identical fills etc).
I can only image that the unrelated contents of a table somehow impact on
the performance of an index - no idea why.
Granted, the OS cache in Linux combined with sufficient RAM means that it's
more of a non-issue than an issue for me. I can still imagine that large
databases with a high number of concurrent users querying a random set of
values would see higher disk utilisation than they should.

Anyhow, just thought I'd share the results of the quick test I did. Hope
this helps someone.

Regards,
David