Subject | Re: RES: [firebird-support] Should i use UTF8 for all character fields in my database? |
---|---|
Author | Fabiano |
Post date | 2012-12-03T16:17:57Z |
Hi all!
I would like to put a note here for other people who are planning to use UTF-8 charsets and ci/ai collations.
The tests i realized before (in a previous message) didn't show the real numbers.
After doing more tests, i realized that searchs using LIKE, STARTING WITH, CONTAINING, etc... have an HUGE performance penalty when using UTF8 and CI/AI collation.
Here are the numbers, based on a table with 5.500 records and searching in a utf8 ci/ai field
select * from pl1_cidades c where c.nome = 'Americana'
78ms
select * from pl1_cidades c where c.nome like 'Americana'
12s75ms (!!!)
select * from pl1_cidades c where c.nome containing 'Americana'
6s208ms (!!!)
And here are the numbers, based on a table with 5.500 records and searching in a iso8851-1 field:
select * from pl1_cidadesa c where c.nome = 'Americana'
78ms
select * from pl1_cidadesa c where c.nome like 'Americana'
98ms
select * from pl1_cidadesa c where c.nome containing 'Americana'
98ms
So, unless there are some plans to improve UTF8 searchs a lot in next firebird versions, for now i think it's a risk to use this charset if your project will need to use SQL instructions like above.
Regards,
Fabiano
I would like to put a note here for other people who are planning to use UTF-8 charsets and ci/ai collations.
The tests i realized before (in a previous message) didn't show the real numbers.
After doing more tests, i realized that searchs using LIKE, STARTING WITH, CONTAINING, etc... have an HUGE performance penalty when using UTF8 and CI/AI collation.
Here are the numbers, based on a table with 5.500 records and searching in a utf8 ci/ai field
select * from pl1_cidades c where c.nome = 'Americana'
78ms
select * from pl1_cidades c where c.nome like 'Americana'
12s75ms (!!!)
select * from pl1_cidades c where c.nome containing 'Americana'
6s208ms (!!!)
And here are the numbers, based on a table with 5.500 records and searching in a iso8851-1 field:
select * from pl1_cidadesa c where c.nome = 'Americana'
78ms
select * from pl1_cidadesa c where c.nome like 'Americana'
98ms
select * from pl1_cidadesa c where c.nome containing 'Americana'
98ms
So, unless there are some plans to improve UTF8 searchs a lot in next firebird versions, for now i think it's a risk to use this charset if your project will need to use SQL instructions like above.
Regards,
Fabiano
--- In firebird-support@yahoogroups.com, "Fabiano" <fabiano@...> wrote:
>
> Hi!
>
> I tried that (actually inserted 50.000 records each table), and here follows the test results:
>
> recreate table test_utf8 (
> column1 varchar(100) character set UTF8 collate unicode_ci_ai);
>
> recreate table test_iso8859_1 (
> column1 varchar(100) character set ISO8859_1 collate pt_br);
>
> -- 1m5s146
> set term !! ;
> execute block
> as
> declare variable i integer;
> begin
> i = 0;
> while (i < 50000) do
> begin
> insert into test_utf8 values (uuid_to_char(gen_uuid()));
> i = i + 1;
> end
> insert into test_utf8 values ('xxx');
> end !!
> set term ; !!
>
> -- 1m4s959
> set term !! ;
> execute block
> as
> declare variable i integer;
> begin
> i = 0;
> while (i < 50000) do
> begin
> insert into test_iso8859_1 values (uuid_to_char(gen_uuid()));
> i = i + 1;
> end
> insert into test_iso8859_1 values ('xxx');
> end !!
> set term ; !!
>
> -- 0s468
> select * from test_utf8 where column1 = 'xxx';
>
> -- 0s561
> select * from test_iso8859_1 where column1 = 'xxx';
>
> -- 1s716, 1s653, 1s841
> select first 1 * from test_utf8 order by column1;
>
> -- 0s671, 0s717, 0s748
> select first 1 * from test_iso8859_1 order by column1;
>
> So, the main performance difference here was ordering the data (2X-3X slower). Other operations seems to have the same performance (althought i didnĀ“t use special characters in the records).
>
> Regards,
>
> Fabiano
>
> --- In firebird-support@yahoogroups.com, "Fabiano" <fabianoaspro@> wrote:
> >
> > Firebird is extreme slow with UTF-8 character set.
> >
> > Try it:
> > Create the same table with/without UTF-8 encoding, put 5000 records with no
> > indexes then do:
> >
> > 'select * from table order by fieldA
> >
> > select * from table where FieldA='xxxx'
> >
> >
> >
> > Cheers
> >
> >
> >
> >
> >
> > [Non-text portions of this message have been removed]
> >
>