Subject Re: [Firebird-Architect] Data Stream Encoding
Author Steen Jansdal
Jim Starkey wrote:
> I early March I posted some ideas of a structured data message encoding
> for use in the lower layers of a new primary database API. Steen
> Jansdal made a few suggestions that caused me to think the issue from
> the beginning.
>
> As a quick refresher, the encoding scheme is used to transmit an ordered
> list of values through the various pieces of plumbing (client API,
> y-valve, line protocol, sql, blr and back) as a simple message. The
> requirements are:
>
> 1. Platform independent
> 2. Dense
> 3. Cheap to encode
> 4. Cheap to decode
>
> My original proposal had a very small number of data type codes with
> numbers (both lengths and values) encoded as variable length binary.
> The new version:
>
> 1. Abandons variable length binary altogether
> 2. Uses the code byte to specify either the data length or the byte
> length of the data length
> 3. Greatly expands the range of integer values encoded in the code
> byte itself
>
> The new scheme, happily, is denser, easier to encode, and easier to decode.
>
> A summary of the codes with a little annotation (as opposed to
> Ann-notation):
>
> enum DataStreamCode
> {
> edsNull = 1,
>
> edsIntMinus10 = 10, // literal integers -10 to 31
> ...
> edsInt0,
> edsInt1,
> ...
> edsInt31,
>
> edsIntLen1 = 60, // signed integer of length 1
> edsIntLen2,
> edsIntLen3,
> edsIntLen4,
> edsIntLen5,
> edsIntLen6,
> edsIntLen7,
>
> edsUtf8Len0 = 70, // Utf8 string of length 0
> ...
> edsUtf8Len39,
>
> edsUtf8Count1 = 120, // Utf8 with one count byte
> edsUtf8Count2,
> edsUtf8Count3,
> edsUtf8Count4,
>
> edsOpaqueCount1 = 130, // Opaque with one count byte
> edsOpaqueCount2,
> edsOpaqueCount3,
> edsOpaqueCount4,
>
> edsDoubleLen2 = 140,
> edsDoubleLen3,
> edsDoubleLen4,
> edsDoubleLen5,
> edsDoubleLen6,
> edsDoubleLen7,
>
> edsDaysLen1 = 150, // seconds since January 1, 1970
> edsDaysLen2,
> edsDaysLen3,
> edsDaysLen4,
>
> edsMillisLen1 = 160, // milliseconds since January 1, 1970
> edsMillisLen2,
> edsMillisLen3,
> edsMillisLen4,
> edsMillisLen5,
> edsMillisLen6,
> edsMillisLen7,
> edsMillisLen8,
> };
>
> I don't really what range of integer builtins or utf8 lengths make
> sense. I plan to sample a variety of existing databases to gather some
> data.
>
> I have left holes that could be used to extend some range or add
> additional types as needed down the road.
>

Another idea for optimizing the amount of data sent over the wire:
Similar text strings could be sent only once. Second time a text string
should be sent a reference to the first string are sent instead.

IIRC the default java serialization do something like that.

Steen