Subject | Re: [Firebird-Architect] UTF-8 over UTF-16 WAS: Applications of Encoded Data Streams |
---|---|
Author | Ann W. Harrison |
Post date | 2005-05-03T16:08:48Z |
David Johnson wrote:
character length (RDB$CHARACTER_SIZE)*. It allocates fields at the
maximum number of bytes that the character length could possible
require. That sounds like a lot of wasted space, but the trailing part
of the field is compressed before being written to a database page. It
does waste memory space, but Firebird is rarely criticized as a memory pig.
allow the definition of fields as varchar without a specific length.
pages, but pages are fixed length. They don't fragment.
the case where a page has space for the record that's about to be stored
but not contiguous space.
record increases in length so that it no longer fits on page, it is
fragmented and the tail is stuck on a different page.
record, it checks the active pointer page for pages with space, so a new
record can go on a very early page if other data has been deleted from
that page.
Regards,
Ann
>Firebird currently carries both a byte length (RDB$FIELD_SIZE) and a
>
> With a multibyte character set, whether UTF-8 or UTF-16, we need to get
> away from the idea of allocating maximum storage in bytes for the number
> of characters. A string allocation needs to be exactly as long as it
> needs to be, and no more. Byte size and character size need to be
> understood and treated as distinct measures that may not be directly
> related. i.e. length(x) and sizeof(x) may be distinctly different, and
> should be treated as independent values.
character length (RDB$CHARACTER_SIZE)*. It allocates fields at the
maximum number of bytes that the character length could possible
require. That sounds like a lot of wasted space, but the trailing part
of the field is compressed before being written to a database page. It
does waste memory space, but Firebird is rarely criticized as a memory pig.
>Jim's proposal for data stream encoding would solve the problem and
> In my not so humble and often underinformed opinion, when allocating
> space, you need to allocate the byte size of the string.
allow the definition of fields as varchar without a specific length.
> You may needErr.. That's not the way it works. Records can be fragmented across
> to be prepared to split pages on different boundaries than you do today,
> and re-org pages if they get fragmented.
pages, but pages are fixed length. They don't fragment.
>And Firebird currently does that - not every time a page is read, but in
> After an insert or update, re-orging a single page that you have already
> read into memory is relatively cheap, since you will have to rewrite the
> entire page after modification anyways.
the case where a page has space for the record that's about to be stored
but not contiguous space.
> Splitting a page is more complex,Because of compression, records increase in length all the time. If a
record increases in length so that it no longer fits on page, it is
fragmented and the tail is stuck on a different page.
> but it is similar to what you do when you append a record andFirebird doesn't exactly append records. When it needs to store a new
> you run out of space in the last active page.
record, it checks the active pointer page for pages with space, so a new
record can go on a very early page if other data has been deleted from
that page.
Regards,
Ann