firebird-support - Re: Benefit from Sequential Insert of High/Low PK

Subject	Re: Benefit from Sequential Insert of High/Low PK
Author	Jeff
Post date	2013-01-28T19:21:14Z

Ann, thanks for your answer. Thanks also goes to all others who have shared their knowledge. Please see inline.

--- In firebird-support@yahoogroups.com, Ann Harrison wrote:
>
> On Fri, Jan 25, 2013 at 4:21 PM, Jeff wrote:
> >
> >
> > (Using FB 1.5)
> >
>
> OK, that's scary right there. Firebird 1.5 is ten years old. Newer

Not scary for me, it's rather rock solid. Amazing :) I'm poised at upgrading though.

> versions use approximately the same index structures but have lots of bugs
> fixed, including several important ones in the index code.
>
> >
> > Do inserts in FB benefit from ordered or sequential PK?
> >
>
> Yes. Unlike the databases that the article considers, Firebird stores data
> and indexes separately - even primary key indexes, so its talk about
> "shuffling" data is irrelevant. But still there are advantages to
> inserting keys into an index in ascending (for ascending indexes) or
> descending (for descending indexes) order. The most efficient way to
> generate primary keys is with a generator - also called a "sequence" in
> newer versions.
>
>
> > Please allow me to clarify. I intend to use High/Low for table PKs. With
> > this approach, it is very possible that PKs will not be in sequential order
> > as they are inserted into the db. Will this be an issue in FB?
> >
>
> I don't know what High/Low is, so that's hard to answer. Firebird is
> designed to handle random inserts into indexes with reasonable efficiency,
> but building indexes sequentially creates a dense index at low insert cost.
> When creating a new index on an existing table, Firebird reads the table,
> sorting by they index key and builds the index from the sorted output.
>

High/Low is a technique of generating primary keys client side in contrast to generating keys server side with fb generators. Here is a link to a discussion that probably could explain it better:
http://stackoverflow.com/questions/282099/whats-the-hi-lo-algorithm

The application I am trying to build is distributed system spread geographically, most of them on slow connection. Hence, the need for client side generated primary keys. If the hi/lo technique works as expected and let's say for example there are a hundred client users doing inserts at random times of the day, then there will be instances where each batch of inserts are not in sequence with other client's batch of inserts. Fast forward three years and the table now holds hundreds of thousands or even a million record, while most or all clients are still working with their initial high values. Woe to the client with the first acquired (least) high value; his inserts would fall rather towards the beginning of the table/index, while the client with the last acquired (highest) high value would fall towards the end of the table/index (lucky?).

If my previous assumptions are correct, then it is worth making an effort to ensure that all inserts fall towards the end of the table/index by prefixing the generated PKs with a value based on the current time, for instance (all clients in the same time zone). More work though, and an integer data type might not suffice to accomodate the now very large value PK. Maybe should use Numeric(18,0)?

OTOH, if you say these are non issue with fb, then maybe I should just consider GUIDs or UUIDs for PKs and forget about the server altogether with regards to PK generation?

> Why is that fast? Two reasons, neither of which as to do with the
> (relatively) low cost of moving index entries in memory.
>
> The first is that random index entries tend to land on random pages in the
> index. Unless you can hold the entire index (and everything else you need)
> in memory, that means that pages will be written and read multiple times.
> One of the key misunderstandings in databases is that an index is a
> good alternative to a sort because the cost of an index look up is K*n,
> while the cost of a sort is L*nlog(n). It's rarely noticed that the value
> of K (the cost of a page read) is huge compared with log(n).
> Filling an index page with ordered values and going on to the next page is
> more efficient that putting one entry on one page, the next on another, and
> so on, even if all the pages are in memory.
>
> The second reason to store rows in the order of their index (i.e. not just
> sorted, but sorted ascending for ascending, descending for descending) is
> prefix compression. Firebird drop the first bytes of a key if they match
> the previous key value. So if you store AAAAAA, AAAAAAB, AAAAAAC, AAAAABA,
> what goes into the index is AAAAAA, 6B, 6C, 4BA. Prefix compression makes
> the indexes dense, meaning that you read fewer index pages to get to the
> data you want. Clearly if you store those rows in reverse order, the first
> key in is AAAAABA. When you store AAAAAAC, the index contains AAAAAAC and
> the second key value becomes 5BA. Each subsequent entry requires a change
> to the next older entry.

Thank you for your excellent explanation.

>
> I based this question on this article:
> >
> > http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database
> > ort/
> >
> >
> I've only scanned the article, but if it offers you a reliable way to turn
> GUIDs into strings (or byte arrays) in ascending order, the algorithm will
> reduce the size of indexes and make inserts faster.
>
> Cheers,
>
> Ann

>
>
> [Non-text portions of this message have been removed]
>