Subject Re: [Firebird-Architect] Re: [firebird-support] Writing UTF16 to the database
Author Jim Starkey
Arno Brinkman wrote:

>I have my question how good the compression (RLE) works on UTF-8 / UTF-16 or any
>other character encoding. Using another kind of compression could also enlarge
>the size of record-versions.
>
>
>
Run length encoded is most useful for binary zeros and trailing blanks.
UTF-8 should have almost no effect at all. UTF-16 will effectively
block compression of successive blanks.

Some thought might be given to changing the unit of compression from 8
to 16 bytes. Small compressible things like an isolated zero 32 bit
integer won't get compressed, so compression will be less. On the other
hand, disks are now free and cache sizes are imensely larger, so it
probably wouldn't make much difference for typical applications.

Even better, a smarter compression scheme might be a smarter answer. I
choose run length encoding because compressing was cheap and
decompression free. Given that ration of cpu speed to disk access times
has changed radically, I think we could do much better. It sounds like
a great project for someone to take on.

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376