Subject Re: [ib-support] Db Design opinions sought - Lookup tables and surrogate PKs
Author Frank Ingermann
Hi Rob,

Rob Schuff wrote:
> greetings folks,
> we are having a spirited discussion at a work about lookup table and whether
> or not they should have a surrogate primary (integer) or whether its better
> to just use the lookup value (or an uppercase'd copy of it) as the primary
> key. I personally have been on both sides of the fence before (normalize to
> the max or be somewhat pragmatic). Anybody care to share their opinions?
> rob

imho it strongly depends on how the lookups will be used - there's pro's and
con's to both approaches:

with the lookup value as a key you
- usually need no joins in the master select to display some meaningful data
(but in practice you often need more than just the uppercased version of the key
itself anyway, so that's a weak argument)
- can easily sort the master table by the lookup value

whereas with the surrogate integer key you
- have a true *identity* for the lookup record (when you change the lookup
display value, you don't have to worry about any FKs pointing to it, even if you
don't do RI by the book)
- have usually less data to store in the master table (depending on the length
of your lookup value, of course, but an int is usually smaller than a char)
- save space for the lookup indexes for the same reason
- have less problems should your data model ever change (let's say you want
another lookup table stuffed between your current master and lookup table - no
problem as long as you make a surrogate int key in the new table as well)
- don't have to take any decisions on charsets and/or collations for the key
fields <bg> - an int is an int in every language and collations don't matter :-)

my (very personal) way of thinking here is: *ALWAYS* use surrogate int keys in
*EVERY* table, even it makes no sense at first glance. The reason i do this is
that i use a lot of in-memory objects that are loaded from tables in the gdb at
runtime. Since every one of my tables starts with an ID (Integer,PK) column, i
wrote a base class with only the ID as a property, a virtual class function that
returns the table name plus methods like ShowRecord, LoadFromDataset(ID
integer), SaveToDataset(ID integer) etc. Then i have a derivate class for every
table that i create with a little self-made source code generator right out of
the GDB'S metadata (so no hand-coding needed): each represents a table's columns
by properties and "knows" how to load itsself from and store itsself into the db
(because it knows its own IDentity)

long story short: i think of integer IDs in tables as the "persistent
representation" of a pointer to an object in memory (which happens to be an
integer, too, and is also a kind of an "identity" for an object...) - but then
again i'm a real object fanatic ;-)

...awaiting other opinions!


when parsers parse, and compilers compile, then why don't objects object?

fingerbirdy - fingerman's door to Firebird