Subject Re: [Firebird-Java] Non-numeric primary key
Author Roman Rokytskyy
> I am toying (at this initial stage) between numeric primary keys and
> string based primary keys.

> The latter because I could used a uuid generator within my EJB
> ( to get a (almost) unique
> key (in both space and time).

> However, after a lot of "google" searches I have yet to understand
> conclusively whether a numeric PK would be more "efficient" to index
> as opposed to a varchar(36) key (particularly when the DB has
> between 100,000 to a millon records to sift through).

This is more Firebird support question than Java one. I have limited
knowledge of the Firebird implementation, but I will try to give you an
answer. For more details you can ask in Firebird-Support group.

The index in Firebird is organized as a tree and index nodex contain packed
keys. For example if you have a key "1111" and "1112" in theory the entry
for the second key will contain a number of matching characters and the
difference, e.g. something like 3:"2". When the search is performed, the
index pages are, so to say, decoded during search.

If your GUID generation algorithm produces something close to the white
noise, the index compression algorithm will not work. This means that more
pages must be read from the disk, therefore the performance is lower. Also
cosider that the smaller key is, the more data fits on the index page, the
less disk reads are needed.

From my limited point of view numeric key will be faster both during inserts
and during search. But ask in Firebird-Support, I know that there was/is
some work done in this area.

> I would appreciate your thoughts on this matter (and also if the
> argument would apply to most RDBMS's).

Personally I have created an entity bean that manages groups of key. Then I
have a utility class that "orders" a set of keys from the bean (usually a
block of 100 keys) and gives them back to the application when needed. When
the block becomes empty, a new one is obtained from the database. See the
implementation details in the "EJB Design Patterns" book from (it is available for download there).

If you still prefer to go with the string key, consider using MD5 digest.
It's only 128-bit wide and so far no key collisions between the digests of
the different strings were found. So generate your 36-char UID and compute
the MD5 digest. This is a big CPU intensive during insert, but you might
save more during selects. But check in the prototype, maybe the index key
compression of the 36-char UIDs will work better than MD5 "white-noise".