Subject Re: [IB-Architect] Next ODS change / RLE Encryption
Author Ivan Prenosil
> There's an easy change to RLE encoding that would increase compression
> substantially for long string values.
>
> Use 0 to 128 to mean the character value.
> Use -2 to -128 to mean a repeat count followed by a character.
> Use -1 to mean that the next two bytes are a repeat count followed by a
> character.
>
> This would allow an empty varchar(32000) to be stored as 4 bytes:
...

Sounds good.

>
> If my assumptions are correct, then -2 is probably also available. I
> thought about using this to lead into a 4 byte compression, but unless this
> is applied to memos or blobs, then there's no value to it (even there, it's
> hard to imagine a document with more than 65536 repeated characters).


If -2 is available, what about

Use -3 to -128 to mean a repeat count
Use -2 to mean that the next byte plus 129 is repeat count
Use -1 to mean that the next two bytes are a repeat count

That way 129 to 384 bytes can be compressed to 3 bytes.


> Another alternative would be to look at the second byte and apply this rule:
>
> if (byte1 < 0) and (byte2 < 0) then
> result is byte3 repeated (abs(byte1) * 128 + abs(byte2)) times
>
> This has the advantage of encoding in one less byte (using the fact that we
> know characters are >= 0 and <= 127);

If I need to store my name correctly, I definitely need to store
character > 127. (Anybody who is curious what is my REAL name,
replace "r" by "r-with-caron" character, i.e. $F8 from Latin-2
or Win1250 character set).

Besides, RLE compression is applied to whole row (i.e. including
binary data), not only to strings.

Ivan
prenosil@...