Subject Re: [Firebird-Architect] Re: Data Streaming -- New Message Format
Author Jim Starkey
Brad Pepers wrote:

>I was wondering the considerations between a binary format like Jim just
>proposed as compared to a text based method. I know some of the
>argument involved but I compare them to the benefits of a human readable
>stream for debugging and lack of worrying about endian-ness and other
>issues and I often opt for text instead of binary in my own protocol
>designs. Perhaps a good argument could be made for needing a binary
>stream but I'd like to hear it and be able to respond. Some things I
>can think of:
>
>1. The text representation would be larger than pure binary which is
>true but by what factor and does it matter as much these days? I
>suppose it always matters to at least think about otherwise you are down
>the slippery slope to bloat. If you had the option to allow compression
>of the text stream when both sides support it, then I don't think it
>would be that different than the binary method.
>
>
Ok, comparisons. The number 1,000,000 requires eight bytes in ascii (7
digits and a delimiter) and three bytes in variable length binary. The
string "This is a string of medium length" (length 33 if I counted
correctly) requires 36 characters in ascii (the length, a delimiter, the
string) in ascii and 34 bytes when the length is variable length
binary. The large the number or length of the string, the greater the
advantage to a binary encoding.

Is this significant? I don't know.

>2. The time spent to convert the numbers, dates, ... into text could be
>a concern but then there is already going to be some encoding going on
>in the binary method proposed so I don't think it would be a large
>difference and would not compare to the things that really take time
>like disk speeds.
>
>
Dates would go as numeric deltas from a base date. Text would almost
certainly be sent as UTF-8, so encoding isn't necessary. I don't see
any benefit to encoding opaque data. If it's ugly in the debugger, who
really cares?

>3. The blob data type would be a problem but it could also be converted
>into a text representation like base64 which along with a compressed
>stream wouldn't be too much of a difference I believe.
>
>So what are the fundamental problems with using a text stream? Is it
>just hide-bound belief in binary streams or are there solid valid
>arguments to be made?
>
>
>
I think it's worth discussing. The bloat for all character isn't gross,
and the respective costs to generate and parse the streams are
comparable. On the other hand, I think the ability to view the data in
ascii isn't worth much at all. The data stream will always be embedded
in a larger protocol that isn't character based, so either a tool or
debugging skill is required in any case. Furthermore, these will
probably be exactly two classes that generate and parse the streams, one
in C++ and one in Java. Adding any runtime overhead to save a couple of
guys at most half an hour over the lifetime of the product doesn't make
a lot of sense to me.

For non-time critical protocols, I favor exchange of XML packets.
There's no question that unnecessary cycles are squandered in the
process, but in most cases the network latency dwarfs the differences,
and if it takes another millisecond, who's going to be upset. The
database server, on the other hand, is a reasonable busy fellow talking
to lots of clients and batching lots of data packets. Does it need more
work? If the cost were low (I think it is) and the benefit significant
(I don't think it is), then character based might make sense.

So let me drop this back in your court. I'll probably write at least
the C++ code (I don't do GPL, so Roman is on his own), I don't care
about a character representation for debugging purposes, so the
debugging benefit is nil. Anyone who trace execution by looking at the
bits is nuts. Make a good case for ascii. I don't think we should
reject it out of hand.

By the way, Ann's point about floating is very well taken. It doesn't
map into readable lossless strings. IEEE went through a lot of pain an
suffering to bludgeon very manufacturer into using exactly the same
floating point representation. Let's not screw it up.


--

Jim Starkey
Netfrastructure, Inc.
978 526-1376