Subject | Re: [firebird-support] Benefit from Sequential Insert of High/Low PK |
---|---|
Author | Ann Harrison |
Post date | 2013-01-27T19:23:21Z |
On Fri, Jan 25, 2013 at 4:21 PM, Jeff <jeffplata@...> wrote:
versions use approximately the same index structures but have lots of bugs
fixed, including several important ones in the index code.
and indexes separately - even primary key indexes, so its talk about
"shuffling" data is irrelevant. But still there are advantages to
inserting keys into an index in ascending (for ascending indexes) or
descending (for descending indexes) order. The most efficient way to
generate primary keys is with a generator - also called a "sequence" in
newer versions.
designed to handle random inserts into indexes with reasonable efficiency,
but building indexes sequentially creates a dense index at low insert cost.
When creating a new index on an existing table, Firebird reads the table,
sorting by they index key and builds the index from the sorted output.
Why is that fast? Two reasons, neither of which as to do with the
(relatively) low cost of moving index entries in memory.
The first is that random index entries tend to land on random pages in the
index. Unless you can hold the entire index (and everything else you need)
in memory, that means that pages will be written and read multiple times.
<rant>One of the key misunderstandings in databases is that an index is a
good alternative to a sort because the cost of an index look up is K*n,
while the cost of a sort is L*nlog(n). It's rarely noticed that the value
of K (the cost of a page read) is huge compared with log(n).</rant>
Filling an index page with ordered values and going on to the next page is
more efficient that putting one entry on one page, the next on another, and
so on, even if all the pages are in memory.
The second reason to store rows in the order of their index (i.e. not just
sorted, but sorted ascending for ascending, descending for descending) is
prefix compression. Firebird drop the first bytes of a key if they match
the previous key value. So if you store AAAAAA, AAAAAAB, AAAAAAC, AAAAABA,
what goes into the index is AAAAAA, 6B, 6C, 4BA. Prefix compression makes
the indexes dense, meaning that you read fewer index pages to get to the
data you want. Clearly if you store those rows in reverse order, the first
key in is AAAAABA. When you store AAAAAAC, the index contains AAAAAAC and
the second key value becomes 5BA. Each subsequent entry requires a change
to the next older entry.
I based this question on this article:
GUIDs into strings (or byte arrays) in ascending order, the algorithm will
reduce the size of indexes and make inserts faster.
Cheers,
Ann
[Non-text portions of this message have been removed]
>OK, that's scary right there. Firebird 1.5 is ten years old. Newer
>
> (Using FB 1.5)
>
versions use approximately the same index structures but have lots of bugs
fixed, including several important ones in the index code.
>Yes. Unlike the databases that the article considers, Firebird stores data
> Do inserts in FB benefit from ordered or sequential PK?
>
and indexes separately - even primary key indexes, so its talk about
"shuffling" data is irrelevant. But still there are advantages to
inserting keys into an index in ascending (for ascending indexes) or
descending (for descending indexes) order. The most efficient way to
generate primary keys is with a generator - also called a "sequence" in
newer versions.
> Please allow me to clarify. I intend to use High/Low for table PKs. WithI don't know what High/Low is, so that's hard to answer. Firebird is
> this approach, it is very possible that PKs will not be in sequential order
> as they are inserted into the db. Will this be an issue in FB?
>
designed to handle random inserts into indexes with reasonable efficiency,
but building indexes sequentially creates a dense index at low insert cost.
When creating a new index on an existing table, Firebird reads the table,
sorting by they index key and builds the index from the sorted output.
Why is that fast? Two reasons, neither of which as to do with the
(relatively) low cost of moving index entries in memory.
The first is that random index entries tend to land on random pages in the
index. Unless you can hold the entire index (and everything else you need)
in memory, that means that pages will be written and read multiple times.
<rant>One of the key misunderstandings in databases is that an index is a
good alternative to a sort because the cost of an index look up is K*n,
while the cost of a sort is L*nlog(n). It's rarely noticed that the value
of K (the cost of a page read) is huge compared with log(n).</rant>
Filling an index page with ordered values and going on to the next page is
more efficient that putting one entry on one page, the next on another, and
so on, even if all the pages are in memory.
The second reason to store rows in the order of their index (i.e. not just
sorted, but sorted ascending for ascending, descending for descending) is
prefix compression. Firebird drop the first bytes of a key if they match
the previous key value. So if you store AAAAAA, AAAAAAB, AAAAAAC, AAAAABA,
what goes into the index is AAAAAA, 6B, 6C, 4BA. Prefix compression makes
the indexes dense, meaning that you read fewer index pages to get to the
data you want. Clearly if you store those rows in reverse order, the first
key in is AAAAABA. When you store AAAAAAC, the index contains AAAAAAC and
the second key value becomes 5BA. Each subsequent entry requires a change
to the next older entry.
I based this question on this article:
>I've only scanned the article, but if it offers you a reliable way to turn
> http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database
> ort/ <http://groups.yahoo.com/group/firebird-support/>
>
>
GUIDs into strings (or byte arrays) in ascending order, the algorithm will
reduce the size of indexes and make inserts faster.
Cheers,
Ann
[Non-text portions of this message have been removed]