Subject | Data Streaming -- New Message Format |
---|---|
Author | Jim Starkey |
Post date | 2005-03-04T17:11:27Z |
I've been thinking a new data streaming message format for use in the
new API, the remote protocol, BLR, and, maybe at some time in the
future, the ODS. T Here are some ideas to bounce around.
The data streaming message format would be used bidirectionally across
the new API and plumbing. The first requirement is platform
independence to allow a data message to be transmitted across machine
architectures without reformatting or conversion. The second
requirement is density to avoid transmitting unnecessary fluff.
Compression friendliness is a big plus. The third requirement is
extensibility to allow the introduction of new data types without
impacting existing plumbing.
It is intended that the primary (if not sole) client of the message
format will be a data boat class similar to the IscDbc and Vulcan Values
class to accepts and serves data items (converting if necessary). This
was would serialize/deserialize itself using the data streaming message
format.
Here is a first cut on the format:
<message> := <version> <item-count> [ <data-item> ]...
<data-item> := dstNull
:= dstZero
:= dstOne
:= dstNegativeOne
:= dstNumber <number>
:= dstNegativeNumber <number>
:= dstScaledNumber <scale> <number>
:= dstNegativeScaledNumber <scale> <number>
:= dstDoubleFloat <float>
:= dstUTF8 <length> <utf-8-characters>
:= dstOpaque <length> <bytes>
:= dstDate <precision> <number>
:= dstNegativeDate <precision> <number>
<number> := <unflagged-7-bit-byte>
:= <flagged-7-bit-byte> <number>
<version> := <unsigned-byte>
<scale> := <signed-byte>
<precision> := <signed-byte>
<item_count>:= <number>
<length> := <number>
Note that most (not all) numbers are passed as variable length byte
streams (7 bits per byte plus continuation flag), rather than short,
longs, and int64s, etc. The purpose is three-fold. First, it avoids
any restrictions on size of numbers. Second, it minimizes message
length. Third, it eliminates the idea of specific numeric type in favor
of an abstract "number". Variable length integers don't handle signs
well, so separate type are define for positive and negative numbers.
For brevity and compression friendliness, zero, minus one, and one are
special cased.
A null is passed as an explicit type.
I have rolled data and timestamp into one type with a stated precision.
I haven't decided what precision actually means, but a decimal scale
factor with days == 0 is probably a good candidate. There isn't a type
for time which is properly a scalar rather than a type, but I'm prepared
to lose that this one (again!) if pressed.
I don't know the best way to handle blobs. I'm including to include
them inline as either dstUTF8 or dstOpaque as appropriate, and skip the
extra round trips. Alternatively, blob ids could be passed as dstOpaque
with blob fetch a separate operation. When I invented them, blobs were
conceptually larger than available memory, so a separate stream made
sense. Now, even large blobs are small compared with memory sizes, and
the balance between of server round trips and memory utilization has
changed. This should be a good question to debate.
I don't have any particularly clever ideas about passing floating
values, so assume the obvious.
Thoughts? Errors? Omissions?
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376
[Non-text portions of this message have been removed]
new API, the remote protocol, BLR, and, maybe at some time in the
future, the ODS. T Here are some ideas to bounce around.
The data streaming message format would be used bidirectionally across
the new API and plumbing. The first requirement is platform
independence to allow a data message to be transmitted across machine
architectures without reformatting or conversion. The second
requirement is density to avoid transmitting unnecessary fluff.
Compression friendliness is a big plus. The third requirement is
extensibility to allow the introduction of new data types without
impacting existing plumbing.
It is intended that the primary (if not sole) client of the message
format will be a data boat class similar to the IscDbc and Vulcan Values
class to accepts and serves data items (converting if necessary). This
was would serialize/deserialize itself using the data streaming message
format.
Here is a first cut on the format:
<message> := <version> <item-count> [ <data-item> ]...
<data-item> := dstNull
:= dstZero
:= dstOne
:= dstNegativeOne
:= dstNumber <number>
:= dstNegativeNumber <number>
:= dstScaledNumber <scale> <number>
:= dstNegativeScaledNumber <scale> <number>
:= dstDoubleFloat <float>
:= dstUTF8 <length> <utf-8-characters>
:= dstOpaque <length> <bytes>
:= dstDate <precision> <number>
:= dstNegativeDate <precision> <number>
<number> := <unflagged-7-bit-byte>
:= <flagged-7-bit-byte> <number>
<version> := <unsigned-byte>
<scale> := <signed-byte>
<precision> := <signed-byte>
<item_count>:= <number>
<length> := <number>
Note that most (not all) numbers are passed as variable length byte
streams (7 bits per byte plus continuation flag), rather than short,
longs, and int64s, etc. The purpose is three-fold. First, it avoids
any restrictions on size of numbers. Second, it minimizes message
length. Third, it eliminates the idea of specific numeric type in favor
of an abstract "number". Variable length integers don't handle signs
well, so separate type are define for positive and negative numbers.
For brevity and compression friendliness, zero, minus one, and one are
special cased.
A null is passed as an explicit type.
I have rolled data and timestamp into one type with a stated precision.
I haven't decided what precision actually means, but a decimal scale
factor with days == 0 is probably a good candidate. There isn't a type
for time which is properly a scalar rather than a type, but I'm prepared
to lose that this one (again!) if pressed.
I don't know the best way to handle blobs. I'm including to include
them inline as either dstUTF8 or dstOpaque as appropriate, and skip the
extra round trips. Alternatively, blob ids could be passed as dstOpaque
with blob fetch a separate operation. When I invented them, blobs were
conceptually larger than available memory, so a separate stream made
sense. Now, even large blobs are small compared with memory sizes, and
the balance between of server round trips and memory utilization has
changed. This should be a good question to debate.
I don't have any particularly clever ideas about passing floating
values, so assume the obvious.
Thoughts? Errors? Omissions?
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376
[Non-text portions of this message have been removed]