Subject | Re: [Firebird-Architect] Re: Bi-directional indexes |
---|---|
Author | Jim Starkey |
Post date | 2005-06-25T11:53:14Z |
Martijn Tonies wrote:
out to be a jolly discussion, here is a short primer on how the engine
fetches records:
1. Everything starts with a record number, whether zero, the previous
record plus one, the next set bit number in a bit map from an
index scan, a record number from an index walk, or a record number
from a dbkey.
2. The engine decomposes the record number into pointer page index,
data page slot, and record slot. The pointer and data pages are
fetched in the indicated record (fragment) verified as existing
and primary record.
3. Starting from the primary record, back record versions are
traversed until a record version consistent with the request's
transaction is found. If the next older version is known to be a
delta version, the data is first reconstructed by chasing record
fragments.
4. If the selected record version is a delta version, it is applied
to the previous version data. If the record is fragmented, the
fragments are found and applied.
5. The boolean expression, if any, is evaluated against the record.
What comes out of this process is not a set of records, but a stream of
sequentially materialized records. Because of the multi-generational
nature of the system, it isn't possible to determine whether any record
identified by number is a member of a stream stream until the record is
fully materialized.
If the record stream is sorted, a sort record consisting of a) a mangled
sort key, b) referenced fields, and c) record numbers for the tables
making up the join is created. After the sort, the record context is
reconstructed from the sort record (without, happily, the need to
refetech the records).
So other than the sort case, there is nothing like a result set in the
system, and even the set of sort records exists only as a tree of runs
to be merged.
The questions to be answered are when records are to be materialized and
if a materialized record is anticipated to be revisted, what state is
kept and where is it kept. Conventional wisdom is that big, transient
things don't belong in memory. Blobs, for example, are materialized
page by page on demand, intermediate sort runs are flushed to temp
files, etc. If a materialized result set is to be retained, it has to
go somewhere.
Another part of traditional thinking is that the structures to scroll
cursors belong on the client, not the server. If client has already
received X records and wants to scroll back Y records, why is this a
server operation? This is not to say that the API shouldn't make this
tranparent, just that sufficient context should be present to allow
client side implementation.
faster to fetch records in physical order and sort them than it is to
fetch records in random physical order (and, relatively speaking,
getting even faster since CPU speed, memory speed, and memory sizes
increase faster than disk throughput). The disadvantage of sort, of
course, is that you have to materialize all records before you know
which is first.
Ann is really talking about walking an index backwards -- fetching
records one by one based on index node start at the end of an index then
going forward. The problem is that the index can't be locked when
fetching a record, so when it's time to go back to the index to find the
next record number, that index may have changed quite radically.
>Doesn't it make more sense to get a useful query cache?At the expense of introducing a little practicality into what is turning
>
>If FIRST n SKIP m from some largish resultset is what you're asking
>for several times in a row, why isn't the largish resultset cached?
>
out to be a jolly discussion, here is a short primer on how the engine
fetches records:
1. Everything starts with a record number, whether zero, the previous
record plus one, the next set bit number in a bit map from an
index scan, a record number from an index walk, or a record number
from a dbkey.
2. The engine decomposes the record number into pointer page index,
data page slot, and record slot. The pointer and data pages are
fetched in the indicated record (fragment) verified as existing
and primary record.
3. Starting from the primary record, back record versions are
traversed until a record version consistent with the request's
transaction is found. If the next older version is known to be a
delta version, the data is first reconstructed by chasing record
fragments.
4. If the selected record version is a delta version, it is applied
to the previous version data. If the record is fragmented, the
fragments are found and applied.
5. The boolean expression, if any, is evaluated against the record.
What comes out of this process is not a set of records, but a stream of
sequentially materialized records. Because of the multi-generational
nature of the system, it isn't possible to determine whether any record
identified by number is a member of a stream stream until the record is
fully materialized.
If the record stream is sorted, a sort record consisting of a) a mangled
sort key, b) referenced fields, and c) record numbers for the tables
making up the join is created. After the sort, the record context is
reconstructed from the sort record (without, happily, the need to
refetech the records).
So other than the sort case, there is nothing like a result set in the
system, and even the set of sort records exists only as a tree of runs
to be merged.
The questions to be answered are when records are to be materialized and
if a materialized record is anticipated to be revisted, what state is
kept and where is it kept. Conventional wisdom is that big, transient
things don't belong in memory. Blobs, for example, are materialized
page by page on demand, intermediate sort runs are flushed to temp
files, etc. If a materialized result set is to be retained, it has to
go somewhere.
Another part of traditional thinking is that the structures to scroll
cursors belong on the client, not the server. If client has already
received X records and wants to scroll back Y records, why is this a
server operation? This is not to say that the API shouldn't make this
tranparent, just that sufficient context should be present to allow
client side implementation.
>It's a question of time to first record, not throughput. It is always
>What should indices have to do with sorting? Yes, I know they're used
>sometimes to do the job, but if you're getting several "pages" of rows
>(not datapages) doesn't a good cache make more sense?
>
>
>
faster to fetch records in physical order and sort them than it is to
fetch records in random physical order (and, relatively speaking,
getting even faster since CPU speed, memory speed, and memory sizes
increase faster than disk throughput). The disadvantage of sort, of
course, is that you have to materialize all records before you know
which is first.
Ann is really talking about walking an index backwards -- fetching
records one by one based on index node start at the end of an index then
going forward. The problem is that the index can't be locked when
fetching a record, so when it's time to go back to the index to find the
next record number, that index may have changed quite radically.