Subject Re: [firebird-support] Re: Alternative for UNIQUE constraint
Author Ann W. Harrison
magic1278,

>
>> (CHAR (N) is the wrong choice of data type for this, by the by,
>> although it doesn't affect the max. size of the index).
>
> Why is that?

Because CHAR fields preserve trailing blanks and cause problems
in formatting and in LIKE comparisons. Generally, use CHAR when
your values are very short - two or three characters - and when
all values have the same number of characters. For example,
use varchar for names, but use char for US Social Security numbers.


> I thought having a fixed-length string would be better since
> look ups would just require some simple arithmetic.

Firebird's index keys are much more complicated than that,
especially for compound keys.

First, for all CHAR and VARCHAR fields, the trailing blanks
are snipped off. Then the field is converted to a format that
sorts bytewise according to the declared collation. For
example, a collation that says that 'a' and 'A' are equivalent
would convert all 'a' characters to 'A'. But if the collation
says that 'aa' sorts before 'Aa' which sorts before 'ab',
additional magic is required which adds to the length of
the key. Depending on the complexity of the collation,
the key may triple in length. Somewhere I have a nice essay
from David Schnepper, explaining the handling of three level
collations (case, accent, and spaces)...

For all numeric and floating point fields, including dates
and timestamps but but not including Int64, the value is
converted to double precision floating point, sign inverted,
reordered to sort correctly bytewise, and trailing zeros are
suppressed.

For compound keys, all fields except the last are padded with
blanks or zeros to a multiple of four bytes, then after every
four bytes a fifth byte is added containing the position of
the field in the key. The first field has zeros inserted,
the second has ones, etc.

Finally, when storing a key in an index, Firebird compresses
prefixes. That means that if you store 'abcd' between 'ab'
and 'abcde' in a sequence that starts with 'aa' and ends with
'ac', what you actually get is

'aa', 1'b', 2'cd', 4'e', 1'c'

So no, you're not saving space or processing by using CHAR
fields as keys.


Cheers,

Ann