Subject Re: [Firebird-Architect] A Fresh Look at Collations
Author Alexander Peshkoff
On Monday 21 June 2010 20:26:05 Jim Starkey wrote:
> Adriano dos Santos Fernandes wrote:
> > On 21/06/2010 12:08, Dimitry Sibiryakov wrote:
> >> 21.06.2010 16:46, Sergey Mereutsa wrote:
> >> AWH>> Why do you say UTF-8 is slow?
> >>
> >>> Because you can not count string length, for example, without walking
> >>> it all - because each char in UTF8 (if we speak about it native
> >>> representation) can be from 1 to 6 bytes length.
> >>
> >> But is counting of symbols so frequent operation to care about its
> >> speed?..
> >
> > At least when data come from user, it must be validated and characters
> > counted (when using constrained length strings - aka [VAR]CHAR).
> >
> > If you don't do this things, it will be like Interbase and FB 1.5. It is
> > then better to call it bytes instead of UTF-8.
>
> That a client side operation. If the server doesn't trust the client,
> it can validate incoming utf8 strings, but even that is a cheap operation.
>
> Something that I should have mentioned in passing, incidentally, is that
> all strings in NimbusDB are arbitrary length, so there aren't issues any
> issues of logical versus physical string lengths.

I also do not see big losses in validation of data, coming from client.
Networks still seem to be a bit slower compared with CPUs. What is more
interesting on my mind is how UTF8 strings are planned to be stored in
database. Will there be compression of text, stored on disk?