Subject Re: [firebird-support] accent insensitive collation - performance
Author Geoff Worboys
Geoff Worboys geoff@... [firebird-support] wrote:
> Hi all,

> Firebird v2.5.6 - superserver. Windows 10 x64.

> I was making some changes to a database when I noticed a query
> like this:
> SELECT TEXTBLOB
> FROM TABLEA WHERE TEXTBLOB CONTAINING 'SOME TEXT'

> was suddenly taking about 4 x longer than before. (A table of
> almost a million records, so it was quite distinct and very
> consistent.)

> It turns out the difference was in my collation declaration.
[...]
> The new database had this:

> CREATE COLLATION WIN1252_UNICODE
> FOR WIN1252;

> CREATE COLLATION NOCASE
> FOR WIN1252
> FROM WIN1252_UNICODE
> CASE INSENSITIVE
> ACCENT INSENSITIVE;


> Rebuilt the new database without "accent insensitive" and the
> performance matched the old database.

> Is the performance hit with ACCENT INSENSITIVE to be expected
> in this case?

> I have been considering migrating to UTF8 (but keep putting it
> off because I've quite a bit of other work to do before that
> is possible.) Does anyone know if I will get a similar
> performance hit with ACCENT INSENSITIVE if I use UTF8 and the
> predeclared UNICODE_CI_AI collation?

Well, after testing with a UTF8 based build, even UNICODE_CI_AI
is between 3 and 4 times slower than just UNICODE_CI. So it
seems accent insensitive has a definite and significant cost.
Maybe it's obvious to people in the know but it surprises me.

--
Geoff Worboys
Telesis Computing Pty Ltd