Subject accent insensitive collation - performance
Author Geoff Worboys
Hi all,

Firebird v2.5.6 - superserver. Windows 10 x64.

I was making some changes to a database when I noticed a query
like this:
SELECT TEXTBLOB
FROM TABLEA WHERE TEXTBLOB CONTAINING 'SOME TEXT'

was suddenly taking about 4 x longer than before. (A table of
almost a million records, so it was quite distinct and very
consistent.)

It turns out the difference was in my collation declaration.

The old database had this:

CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;

CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE;


The new database had this:

CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;

CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE
ACCENT INSENSITIVE;


Rebuilt the new database without "accent insensitive" and the
performance matched the old database.

Is the performance hit with ACCENT INSENSITIVE to be expected
in this case?

I have been considering migrating to UTF8 (but keep putting it
off because I've quite a bit of other work to do before that
is possible.) Does anyone know if I will get a similar
performance hit with ACCENT INSENSITIVE if I use UTF8 and the
predeclared UNICODE_CI_AI collation?


I found this tracker item:
http://tracker.firebirdsql.org/browse/CORE-4125
where there was a problem with UNICODE_CI_AI that was fixed
in FB v2.5.3 ... I am supposing the reported ratios mean that
it should be okay.


--
Geoff Worboys
Telesis Computing Pty Ltd