Subject | Re: [firebird-support] why Blob is so slow ? |
---|---|
Author | Tupy... nambá |
Post date | 2012-04-19T11:52:05Z |
Alexandre,
At my point of view, I prefer avoid using BLOB fields. First of all, because these kind of field are not indicated for searches of any kind (most of them are pictures). Second,
because
normally they have very large content, what does the DB increase in a large amount. I think the most important property of the DB´s is the capability of searches. But having fields which don´t allow us to do that, disturb the funcionality of DB´s.
I prefer using to store files outside DB´s, storing inside them the path for the files. So, you have the speed at all operations (searches and backup´s/restores) and not a meaningfull increase of the DB´s.
I´m not sure about the reasons for the backup/restore speed problem, but I believe that inside the DB happens almost the same as at OS environment = when adjacent areas are full, then the OS or the DB manager application most look for distant areas to store parts of the data, causing a data fragmentation. And to access the complete data, the OS or DB manager must "remount" them, before delivering to the client. And the DB itself suffers from the DB file fragmentation at disc level.
At file servers, normally file fragmentation are low (you don´t edit them directly at the server) and still you can defragment the files.
At SQL server, you find discussions about internal tables and indexes fragmentation, and you have commands to repair fragmentation.
At Firebird/Interbase, nobody talks about that, but we know it happens and can became a problem, when the DB is greater in size. BLOB are worst for causing that, affecting not only the BLOB fields and data itself, but also fields and data of other data types. And you don´t have (i never see) commands for DB internal defragment.
Try to do some experiences about that, making comparisons between different solutions for a same problem. May be imediatelly filled DB will not show great differences, but DB´s at common filling (day by day), after a great amount of time, will show meaningfull differences.
Roberto Camargo,Rio de Janeiro / Brazil
At my point of view, I prefer avoid using BLOB fields. First of all, because these kind of field are not indicated for searches of any kind (most of them are pictures). Second,
because
normally they have very large content, what does the DB increase in a large amount. I think the most important property of the DB´s is the capability of searches. But having fields which don´t allow us to do that, disturb the funcionality of DB´s.
I prefer using to store files outside DB´s, storing inside them the path for the files. So, you have the speed at all operations (searches and backup´s/restores) and not a meaningfull increase of the DB´s.
I´m not sure about the reasons for the backup/restore speed problem, but I believe that inside the DB happens almost the same as at OS environment = when adjacent areas are full, then the OS or the DB manager application most look for distant areas to store parts of the data, causing a data fragmentation. And to access the complete data, the OS or DB manager must "remount" them, before delivering to the client. And the DB itself suffers from the DB file fragmentation at disc level.
At file servers, normally file fragmentation are low (you don´t edit them directly at the server) and still you can defragment the files.
At SQL server, you find discussions about internal tables and indexes fragmentation, and you have commands to repair fragmentation.
At Firebird/Interbase, nobody talks about that, but we know it happens and can became a problem, when the DB is greater in size. BLOB are worst for causing that, affecting not only the BLOB fields and data itself, but also fields and data of other data types. And you don´t have (i never see) commands for DB internal defragment.
Try to do some experiences about that, making comparisons between different solutions for a same problem. May be imediatelly filled DB will not show great differences, but DB´s at common filling (day by day), after a great amount of time, will show meaningfull differences.
Roberto Camargo,Rio de Janeiro / Brazil
--- On Thu, 4/19/12, Alexandre Benson Smith <iblist@...> wrote:
From: Alexandre Benson Smith <iblist@...>
Subject: [firebird-support] why Blob is so slow ?
To: firebird-support@yahoogroups.com
Date: Thursday, April 19, 2012, 2:12 AM
For some time I wonder why blob's are so slow during back-up/restore.
when I access one blob alone I don't think it's slow, but during the
process of back-up/restore I can see that the table that holds the blob
took so many time to be processed.
Today I created a simple test case that resamble my real scenario:
Two Tables
CREATE TABLE DOCUMENT
(
DOCUMENTID integer NOT NULL,
PRODUCTID integer,
COMPANYID integer,
KIND char(1) NOT NULL COLLATE PT_BR,
DESCRIPTION varchar(40) NOT NULL,
CONSTRAINT PK_DOCUMENT PRIMARY KEY (DOCUMENTID)
);
CREATE TABLE DOCUMENTOBLOB
(
DOCUMENTBLOBID integer NOT NULL,
DOCUMENTID integer,
ITEM integer NOT NULL,
BINARYDATA blob sub_type 0 NOT NULL,
FILENAME varchar(255) NOT NULL COLLATE PT_BR,
CONSTRAINT PK_DOCUMENTBLOB PRIMARY KEY (DOCUMENTBLOBID),
CONSTRAINT UNQ_DOCUMENTBLOB UNIQUE (DOCUMENTID, ITEM)
);
Table Document has 469 records
Table DocumentBlob has 463 records
The design were made to support more then a BLOB per document (like many
JPG's pages, or a mix of JPG, XLS and PDF)
The database has 245MB.
I create a simple application to measure the size of the Blobs, the size
of the binary data is 236MB.
The blobs are not big, one of 37MB, one of 4MB, two of 2MB, twenty eight
between 1MB and 2MB and the rest less than a MB. The average size is
around 500KB per blob.
Here are the timing for a back-up restore:
# time /opt/firebird/bin/gbak blob_test.fdb blob_test.fbk -user sysdba
-password masterkey -t
real 0m6.927s
user 0m0.671s
sys 0m1.191s
# time /opt/firebird/bin/gbak blob_test.fbk blob_teste2.fdb -rep -user
sysdba -password masterkey -t
real 10m8.894s
user 0m0.042s
sys 0m0.037s
I think it's too slow to process less than 1k records and 250MB of data.
Some more info:
during the back-up or restore the CPU and i/o is low:
Tasks: 93 total, 1 running, 92 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 97.3%id, 2.7%wa, 0.0%hi, 0.0%si,
0.0%st
Tasks: 93 total, 1 running, 92 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Firebird Version:
# /opt/firebird/bin/fbserver -z
Firebird TCP/IP server version LI-V2.1.4.18393 Firebird 2.1
gstat -a -r output:
Database "blob_teste2.fdb"
Database header page information:
Flags 0
Checksum 12345
Generation 17
Page size 16384
ODS version 11.1
Oldest transaction 1
Oldest active 2
Oldest snapshot 2
Next transaction 9
Bumped transaction 1
Sequence number 0
Next attachment ID 5
Implementation ID 19
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 1
Creation date Apr 18, 2012 19:02:10
Attributes force write
Variable header data:
Sweep interval: 20000
*END*
Database file sequence:
File blob_teste2.fdb is the only file
Analyzing database pages ...
DOCUMENTO (128)
Primary pointer page: 137, Index root page: 138
Average record length: 35.67, total records: 469
Average version length: 0.00, total versions: 0, max versions: 0
Data pages: 3, data page slots: 3, average fill: 50%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 2
80 - 99% = 0
Index PK_DOCUMENTO (0)
Depth: 1, leaf buckets: 1, nodes: 469
Average data length: 1.01, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0
DOCUMENTOBLOB (129)
Primary pointer page: 139, Index root page: 140
Average record length: 40.37, total records: 463
Average version length: 0.00, total versions: 0, max versions: 0
Data pages: 7, data page slots: 7, average fill: 90%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 7
Index PK_DOCUMENTOBLOB (0)
Depth: 1, leaf buckets: 1, nodes: 463
Average data length: 1.01, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0
Index UNQ_DOCUMENTOBLOB (1)
Depth: 1, leaf buckets: 1, nodes: 463
Average data length: 5.15, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0
Does anyone has any comments about it ?
see you !
Alexandre
------------------------------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Visit http://www.firebirdsql.org and click the Resources item
on the main (top) menu. Try Knowledgebase and FAQ links !
Also search the knowledgebases at http://www.ibphoenix.com
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Yahoo! Groups Links
[Non-text portions of this message have been removed]