firebird-architect - Re: [Firebird-Architect] Data Stream Encoding

Subject	Re: [Firebird-Architect] Data Stream Encoding
Author	Jim Starkey
Post date	2005-05-03T11:43:57Z

Steen Jansdal wrote:

I was hoping to hear from you.

>
>Excellent suggestion, but can you explain a little about the doubles.
>Isn't it difficult to compress them?
>

Trailing zeros are eliminated. For doubles that represent "feasible"
integers, this is very effective. For doubles that use the full
precision, it probably doesn't do much at all.

>
>Many tables contain a lot of nulls. A further optimization could be
>to invent a DataStreamCode that says two, three, four,... nulls follow.
>The encoding and decoding is a little more difficult but not much.
>
> edsTwoNulls
> edsThreeNulls
> edsFourNulls
>

It slightly complicates decoding in an important case. Assume we were
using the scheme to encode records on disk. To fetch value n, it is
necessary to parse value 0 to n, but no more. One way to handling this
is to keep an array of known offsets to encoded values. If a single
code represents multiple values, maintaining state requires either more
memory or more code in a performance critical space.

The point about repeating values, however, is well taken. I think the
way to address it is some form of post encoding decompression of the
stream itself. It's fairly clear to me that the exisiting run length
encoding is completely worthless with data stream encoding (other than
for your case of repeatings nulls or trailing blanks). I would like to
think about cheap compression schemes that are effective on this type of
data. I don't have any candidates other than variations on LZW, but
that is so tied up in patents that I don't want to go there. Perhaps
some form of pre-computed zip algorithm, but I really don't know.

Going back to trailing blanks, I thought about an explicit class of
string types that encoded the number of trailing blanks. I finally
decided that fixed length strings are such an artifact of the past that
there was no reason to waste code space on them going forward. If there
are opinions to the contrary, I'd like to hear them.

>
>