Subject | Re: [firebird-support] Writing UTF16 to the database |
---|---|
Author | Ann W. Harrison |
Post date | 2005-02-18T19:50:39Z |
Olivier Mascia wrote:
that the answer may be "it depends". Note that the system tables text
fields are nominally UNICODE_FSS, but normally contain ASCII, stored one
byte per character.
If you define a data column as UNICODE_FSS, the system allocates three
bytes of storage for each character of length. System table text fields
allocate one byte per character.
I am reasonably certain (90% confident) that the memory and on-disk
storage are the same, that the format uses a variable number of bytes
per character, and that it follows UTF-8 rules:
for single byte characters, the first bit is zero.
for multi-byte characters the lead byte has first n bits set to
one where n is the length of the character in bytes. Those n bits
are followed by a bit of zero. Subsequent bytes have first two
bits set to 10).
Regards,
Ann
>I'm embarrassed to say it, but I don't know with certainty, and worry
> Why we are at this Ann, may I ask you to confirm to us all how
> UNICODE_FSS is stored inside (memory and disk storage) ?
that the answer may be "it depends". Note that the system tables text
fields are nominally UNICODE_FSS, but normally contain ASCII, stored one
byte per character.
If you define a data column as UNICODE_FSS, the system allocates three
bytes of storage for each character of length. System table text fields
allocate one byte per character.
I am reasonably certain (90% confident) that the memory and on-disk
storage are the same, that the format uses a variable number of bytes
per character, and that it follows UTF-8 rules:
for single byte characters, the first bit is zero.
for multi-byte characters the lead byte has first n bits set to
one where n is the length of the character in bytes. Those n bits
are followed by a bit of zero. Subsequent bytes have first two
bits set to 10).
Regards,
Ann