Subject Restoring Numeric Precision in Index Keys
Author Jim Starkey
After fretting about it for 28 years, I have a solution to the imprecise
numeric index key problem. And, as a special bonus, it's upwards
compatible.

The solution is dirty simple and should have been obvious -- zap the
lower order byte of key to avoid roundoff issues then tack the remaining
bits on the end of the key before truncating trailing zeros.

The problem is to take arbitrary integers up to 2^63 and convert them to
byte strings that can be compared byte-wise. The solution I came up
with eons ago was to convert all number to double precision floating
point, zip the sign bit for positive numbers and complement the whole
thing for negative numbers, reverse the bytes, and zap trailings zeros.
This works because floating point is normalized and the exponent
precedes the mantissa (strictly speaking, the "signifiand", but who's
counting). Other architectures may require different violence to the
byte order.

As a quick review, IEEE double precision floating point has a 1 bit
sign, an 11 bit excess 1023 exponents, and 52 bits of
mantissa/significand. Since doubles are normalized, there is an
implicit virtual binary "1" to the left of the mantissa for non-zero
values, resulting in 53 bits of precision.

Take as an exercise the largest 64 bit signed integer
(9223372036854775807). As an int64, bit has 63 bits of significance.
As a double, it has an effective exponent of 62 and effectively 53 bits
of significance (52 + magic 1). This translates into 10 lost low end
bits, though the 11th low end bit is questionable due to round off, so
let's zap the last byte of the double and replace it, rounding the 10
bits to 18, and something like:

output[-1] = (value >> 10);
*output++ = (value >> 2);
*output++ = value & 0x03;

with the shifts computed appropriately.

Finally, because we adding precision on right, it's upwards compatible.

Before we leave the subject, let me make another pitch for the alternate
handling of multi-segment: Rather than using segment numbers are fixed
intervals, replace "0" bytes in the key with the sequence { 1, 0 }, "1"
bytes with the sequence { 1, 1 }, and separate segments with a single
"0" byte. This eliminates the segment numbers and pad bytes. It does
require doubling zeros and ones, but neither occurs in strings and only
one out of 128 bytes on average, so it is a big win. It isn't
compatible, but Firebird has always support multiple index formats, so
it could be phased in without an ODS change.

--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376



[Non-text portions of this message have been removed]