Subject Re: [Firebird-Architect] Data Stream Encoding
Author Jim Starkey
Geoff Worboys wrote:

>Jim Starkey wrote:
>>Yes, there are a huge number of a cases. But they fall into
>>relatively small number of classes, each is which can be
>>handled with a small, efficient piece of code.
>I have never tested just how efficient the code is when using
>such a huge number of switch elements. I guess it will not be
>very slow, but do wonder if early isolation would be at least
>as fast.
Very efficient. As long as the cases are reasonably dense, the sequence
is: Test upper bound, test lower bound, shift, indexed relative jump.
The VAX (may somebody drive a wooden stake into it's scaly heart) did
the whole thing in one instruction.

The neat thing about blocks of contiguous codes is the difference
between actual code and block start is a) the value itself, b) the
length of the value, or c) the number of length bytes that immediately
follow while the group gives the general type. That's getting a lot of
bang out of an indexed relative jump instruction. Encode is equally
simple and efficient: figure out the class with a few comparisons, add
the length (or whatever) to start of code block, and move in the data.

Your alternative, striking similar to the PDP-11 instruction word,
incidentally, requires masking, shifting, a fair bit of navigation to
get to the specific case. Orthogonality, however, leads to inefficient
use of the code space. While there are obvious benefits in large blocks
of integer literal and short string length code, dedicating the same to
opaque strings, which are infrequent and rarely short, is a waste of
code space.

Something you may have missed is that my scheme can be mapped directly
into your scheme, but not vice versa. In fact, the best way to
implement your scheme is use my decomposition algorithm with code that
map into your bitwise decomposition. So I can't claim any performance
difference. The question really revolves around whether we can achieve
greater density by sizing code blocks based on observed data
demographics than by a rigid allocation of bit fields.

There is also the question of retaining suitable code blocks for things
we haven't thought of or don't yet exist. On the other hand, being able
to answer the demand for an explicit GUID type with "sorry, the code
space is exhausted" is highly attractive, but, alas, short sighted. If
we find that the scheme must be extended to support improbability
quanta, the availability of unused code space will be invaluable.

Thanks for your comments and ideas. I think developers can handle
switch statements with large blocks of labels without that much
difficulty, but you may be right.

[Non-text portions of this message have been removed]