Subject | accent insensitive collation - performance |
---|---|
Author | Geoff Worboys |
Post date | 2016-08-24T04:24:21Z |
Hi all,
Firebird v2.5.6 - superserver. Windows 10 x64.
I was making some changes to a database when I noticed a query
like this:
SELECT TEXTBLOB
FROM TABLEA WHERE TEXTBLOB CONTAINING 'SOME TEXT'
was suddenly taking about 4 x longer than before. (A table of
almost a million records, so it was quite distinct and very
consistent.)
It turns out the difference was in my collation declaration.
The old database had this:
CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;
CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE;
The new database had this:
CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;
CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE
ACCENT INSENSITIVE;
Rebuilt the new database without "accent insensitive" and the
performance matched the old database.
Is the performance hit with ACCENT INSENSITIVE to be expected
in this case?
I have been considering migrating to UTF8 (but keep putting it
off because I've quite a bit of other work to do before that
is possible.) Does anyone know if I will get a similar
performance hit with ACCENT INSENSITIVE if I use UTF8 and the
predeclared UNICODE_CI_AI collation?
I found this tracker item:
http://tracker.firebirdsql.org/browse/CORE-4125
where there was a problem with UNICODE_CI_AI that was fixed
in FB v2.5.3 ... I am supposing the reported ratios mean that
it should be okay.
--
Geoff Worboys
Telesis Computing Pty Ltd
Firebird v2.5.6 - superserver. Windows 10 x64.
I was making some changes to a database when I noticed a query
like this:
SELECT TEXTBLOB
FROM TABLEA WHERE TEXTBLOB CONTAINING 'SOME TEXT'
was suddenly taking about 4 x longer than before. (A table of
almost a million records, so it was quite distinct and very
consistent.)
It turns out the difference was in my collation declaration.
The old database had this:
CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;
CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE;
The new database had this:
CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;
CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE
ACCENT INSENSITIVE;
Rebuilt the new database without "accent insensitive" and the
performance matched the old database.
Is the performance hit with ACCENT INSENSITIVE to be expected
in this case?
I have been considering migrating to UTF8 (but keep putting it
off because I've quite a bit of other work to do before that
is possible.) Does anyone know if I will get a similar
performance hit with ACCENT INSENSITIVE if I use UTF8 and the
predeclared UNICODE_CI_AI collation?
I found this tracker item:
http://tracker.firebirdsql.org/browse/CORE-4125
where there was a problem with UNICODE_CI_AI that was fixed
in FB v2.5.3 ... I am supposing the reported ratios mean that
it should be okay.
--
Geoff Worboys
Telesis Computing Pty Ltd