Subject | Re: [firebird-support] UTF8 in firebird ? |
---|---|
Author | Geoff Worboys |
Post date | 2012-01-07T23:38Z |
Hi Ann,
Ann Harrison wrote:
Thanks for the info on compression techniques. Note however
that my point was that Stephane's tests only show that using
the current implementation UTF8 in Firebird is slower than
using a single-byte character set. To me this is like saying
that a bucket gets heavier when you put more water in it.
Of course it gets slower, the real interest is how much slower
in real situations and would any practical alternative be any
better (not some contrived example that has none of the size
and transcoding overheads of UTF8/unicode).
While you are undoubtedly correct in indicating that the RLE
compression used by Firebird is not as effective as it could
be, but we can't read Stephane's tests as a direct measurement
of the impacts of that encoding. There are many possible
factors involved and they would need to be analysed before
worrying too much about one form of compression vs another,
and the performance comparison should be between unicode
implementations rather than unicode vs sbcs.
One thing that Stephane does say in a recent post is:
UTF8 (and unicode in general) is going to have a resource and
performance impact on their application. However Stephane's
tests may have some review the real need for UTF8 storage if
their applications do not actually need it. To that extent,
at least, this is a useful conversation.
--
Geoff Worboys
Telesis Computing Pty Ltd
Ann Harrison wrote:
> Dear Geoff,[...]
>>
>> I am far from convinced that your testing reveals real-world
>> differences between the current UTF8 implementation vs any
>> practical alternative (which neither ISO_8859 nor OCTETS
>> represent).
> Stephane's tests show that when you carry a lot of extra
> space around in strings, it slows Firebird somewhat. I think
> his example was unusually severe because he over-estimates
> the number of characters in his fields.
Thanks for the info on compression techniques. Note however
that my point was that Stephane's tests only show that using
the current implementation UTF8 in Firebird is slower than
using a single-byte character set. To me this is like saying
that a bucket gets heavier when you put more water in it.
Of course it gets slower, the real interest is how much slower
in real situations and would any practical alternative be any
better (not some contrived example that has none of the size
and transcoding overheads of UTF8/unicode).
While you are undoubtedly correct in indicating that the RLE
compression used by Firebird is not as effective as it could
be, but we can't read Stephane's tests as a direct measurement
of the impacts of that encoding. There are many possible
factors involved and they would need to be analysed before
worrying too much about one form of compression vs another,
and the performance comparison should be between unicode
implementations rather than unicode vs sbcs.
One thing that Stephane does say in a recent post is:
> yes off course, but i was also curious how many people wasI imagine that most experienced developers understand that
> aware that their UTF8 database can be much more slower than
> equivalent ASCII database :) i m sure most of them don't
> know about it when they choose UTF8 ...
UTF8 (and unicode in general) is going to have a resource and
performance impact on their application. However Stephane's
tests may have some review the real need for UTF8 storage if
their applications do not actually need it. To that extent,
at least, this is a useful conversation.
--
Geoff Worboys
Telesis Computing Pty Ltd