Subject Data Stream Encoding
Author Jim Starkey
I early March I posted some ideas of a structured data message encoding
for use in the lower layers of a new primary database API. Steen
Jansdal made a few suggestions that caused me to think the issue from
the beginning.

As a quick refresher, the encoding scheme is used to transmit an ordered
list of values through the various pieces of plumbing (client API,
y-valve, line protocol, sql, blr and back) as a simple message. The
requirements are:

1. Platform independent
2. Dense
3. Cheap to encode
4. Cheap to decode

My original proposal had a very small number of data type codes with
numbers (both lengths and values) encoded as variable length binary.
The new version:

1. Abandons variable length binary altogether
2. Uses the code byte to specify either the data length or the byte
length of the data length
3. Greatly expands the range of integer values encoded in the code
byte itself

The new scheme, happily, is denser, easier to encode, and easier to decode.

A summary of the codes with a little annotation (as opposed to
Ann-notation):

enum DataStreamCode
{
edsNull = 1,

edsIntMinus10 = 10, // literal integers -10 to 31
...
edsInt0,
edsInt1,
...
edsInt31,

edsIntLen1 = 60, // signed integer of length 1
edsIntLen2,
edsIntLen3,
edsIntLen4,
edsIntLen5,
edsIntLen6,
edsIntLen7,

edsUtf8Len0 = 70, // Utf8 string of length 0
...
edsUtf8Len39,

edsUtf8Count1 = 120, // Utf8 with one count byte
edsUtf8Count2,
edsUtf8Count3,
edsUtf8Count4,

edsOpaqueCount1 = 130, // Opaque with one count byte
edsOpaqueCount2,
edsOpaqueCount3,
edsOpaqueCount4,

edsDoubleLen2 = 140,
edsDoubleLen3,
edsDoubleLen4,
edsDoubleLen5,
edsDoubleLen6,
edsDoubleLen7,

edsDaysLen1 = 150, // seconds since January 1, 1970
edsDaysLen2,
edsDaysLen3,
edsDaysLen4,

edsMillisLen1 = 160, // milliseconds since January 1, 1970
edsMillisLen2,
edsMillisLen3,
edsMillisLen4,
edsMillisLen5,
edsMillisLen6,
edsMillisLen7,
edsMillisLen8,
};

I don't really what range of integer builtins or utf8 lengths make
sense. I plan to sample a variety of existing databases to gather some
data.

I have left holes that could be used to extend some range or add
additional types as needed down the road.

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376