Subject Re: [Firebird-Architect] Data Stream Encoding
Author Steen Jansdal
Jim Starkey wrote:
>
> Steen Jansdal wrote:
>
> I was hoping to hear from you.
>
>
>>Excellent suggestion, but can you explain a little about the doubles.
>>Isn't it difficult to compress them?
>>
>
> Trailing zeros are eliminated. For doubles that represent "feasible"
> integers, this is very effective. For doubles that use the full
> precision, it probably doesn't do much at all.
>
>
>>Many tables contain a lot of nulls. A further optimization could be
>>to invent a DataStreamCode that says two, three, four,... nulls follow.
>>The encoding and decoding is a little more difficult but not much.
>>
>> edsTwoNulls
>> edsThreeNulls
>> edsFourNulls
>>
>
> It slightly complicates decoding in an important case. Assume we were
> using the scheme to encode records on disk. To fetch value n, it is
> necessary to parse value 0 to n, but no more.

Please tell me how it's more complicated than your original design.
You also have to parse value 0 to n.

BTW I thought we were talking about a communication protocol, not
a disk record layout. Do they need to be the same?


> One way to handling this
> is to keep an array of known offsets to encoded values. If a single
> code represents multiple values, maintaining state requires either more
> memory or more code in a performance critical space.
>
> The point about repeating values, however, is well taken. I think the
> way to address it is some form of post encoding decompression of the
> stream itself. It's fairly clear to me that the exisiting run length
> encoding is completely worthless with data stream encoding (other than
> for your case of repeatings nulls or trailing blanks). I would like to
> think about cheap compression schemes that are effective on this type of
> data. I don't have any candidates other than variations on LZW, but
> that is so tied up in patents that I don't want to go there. Perhaps
> some form of pre-computed zip algorithm, but I really don't know.
>
> Going back to trailing blanks, I thought about an explicit class of
> string types that encoded the number of trailing blanks. I finally
> decided that fixed length strings are such an artifact of the past that
> there was no reason to waste code space on them going forward. If there
> are opinions to the contrary, I'd like to hear them.
>

Steen