firebird-architect - Re: [Firebird-Architect] Data Stream Encoding

Subject	Re: [Firebird-Architect] Data Stream Encoding
Author	Geoff Worboys
Post date	2005-04-29T23:29:11Z

> A summary of the codes with a little annotation (as opposed
> to Ann-notation):

I cant say that I like it a whole lot. In the past I have
found such lengthy enumerations end up giving me problems.
If you need to pack so much into a byte, I would be inclined
to use bit-fields to divide the problem.

eg:
3-bits to give 8 basic data types
5-bits to give detail of type encoding

There are various other "clever" alternatives but the above
would be pretty straight forward. It does limit the size
of the built-in-length types, but it seems to me that you
already have some pretty arbitrary limitations in that
regard.

The advantage of dividing the problem this way is that the
code is easier to split and less inclined to hit problems
when you find you missed something and need to expand.

For example: Instead of a switch statement handling 170
different options, or a lengthy series of "if in range then"
tests, you have a relatively small switch statement for up to
16 options that can then call separate utility methods to
handle each of the specific types.

0 = null
1 = integer
2 = utf8
3 = opaque
4 = double/float
5 = chrono (date/time)
6 = reserved for expansion
7 = reserved for expansion

The next 5 bits are implemented as appropriate for the
particular data type. If deemed necessary item 7 in the list
could be reserved for specific type expansion, where such a
value would indicate a specific type to be defined in the next
5 bits (or whatever).

Done properly the fact of the bit field need only appear at
the very start of decoding. Dividing the byte into two
separate values for passing into the utility methds. This
way, if you discover later it is not enough, the issue can be
cleanly redivided into separate bytes or whatever for v2 of
the stream.

Similar divide-and-conquer options exist for the next parts
as well. utf8/opaque use 1 bit to switch between describing
the length of the string and the number of the count bytes.
Integer to use 1 bit to switch between implementations.

(Whereas dividing or expanding the scheme you have described
seems to me to be much more difficult.)

Note that you could also take a leaf out of the utf8 book and
use the count-bytes themselves to encode their own length. So
rather than trying to describe count-bytes in the 5 bits you
could do something like:
0..30 = length of utf8 string
31 = self-defining count-bytes exist

On second thought you have presumably considered such schemes
and discarded them for some reason. Still I think the code
would be cleaner without such a huge enumeration.

--
Geoff Worboys
Telesis Computing