Subject Re: [firebird-support] Writing UTF16 to the database
Author Ann W. Harrison
Olivier Mascia wrote:
>
> Why we are at this Ann, may I ask you to confirm to us all how
> UNICODE_FSS is stored inside (memory and disk storage) ?

I'm embarrassed to say it, but I don't know with certainty, and worry
that the answer may be "it depends". Note that the system tables text
fields are nominally UNICODE_FSS, but normally contain ASCII, stored one
byte per character.

If you define a data column as UNICODE_FSS, the system allocates three
bytes of storage for each character of length. System table text fields
allocate one byte per character.

I am reasonably certain (90% confident) that the memory and on-disk
storage are the same, that the format uses a variable number of bytes
per character, and that it follows UTF-8 rules:

for single byte characters, the first bit is zero.

for multi-byte characters the lead byte has first n bits set to
one where n is the length of the character in bytes. Those n bits
are followed by a bit of zero. Subsequent bytes have first two
bits set to 10).

Regards,


Ann