Subject | RE: Re[2]: [Firebird-Java] Re: Streamed Blob |
---|---|
Author | Robert DiFalco |
Post date | 2003-10-06T18:02:41Z |
Just FYI. The break seems to be at 1100 bytes regardless of my blob
segment size, so I don't think it has anything to do with segmented
versus streamed blobs. I just think the string search for blobs works in
~1100 byte increments. The drag is that if you have a word that breaks
across this boundary, even that word cannot be matched. Changing the
segment size does not change this behavior.
R.
-----Original Message-----
From: Nickolay Samofatov [mailto:skidder@...]
Sent: Monday, October 06, 2003 10:30 AM
To: Robert DiFalco
Subject: Re[2]: [Firebird-Java] Re: Streamed Blob
I can demonstrate you the deficiency. There is a table "document"
which contains ~35000 records with text field description varchar(255)
that contains 100-160 chars of data.
Query:
select count(*) from document where description like <pattern>
<pattern> <time>
"%q" 0,2 s
"%a%q" 0,6 s
"%a%a%q" 1,8 s
"%a%a%a%q" 5,6 s
"%a%a%a%a%q" 14,9 s
"%a%a%a%a%a%q" 35,2 s
"%a%a%a%a%a%a%q" 74,7 s
.....
I'll probably take a look at this issue.
--
Nickolay Samofatov mailto:skidder@...
Yahoo! Groups Sponsor
ADVERTISEMENT
To unsubscribe from this group, send an email to:
Firebird-Java-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
segment size, so I don't think it has anything to do with segmented
versus streamed blobs. I just think the string search for blobs works in
~1100 byte increments. The drag is that if you have a word that breaks
across this boundary, even that word cannot be matched. Changing the
segment size does not change this behavior.
R.
-----Original Message-----
From: Nickolay Samofatov [mailto:skidder@...]
Sent: Monday, October 06, 2003 10:30 AM
To: Robert DiFalco
Subject: Re[2]: [Firebird-Java] Re: Streamed Blob
> Ugh, that kind of sucks (i.e. having to read the entire blob intorely
> memory). Seems (as you imply) that a better pattern matching algorithm
> is required, and one that can work on streamed data (i.e. does not
> on the entire blob being read into memory).Yep. Knuth-Morris-Pratt algorithm could do much better job.
I can demonstrate you the deficiency. There is a table "document"
which contains ~35000 records with text field description varchar(255)
that contains 100-160 chars of data.
Query:
select count(*) from document where description like <pattern>
<pattern> <time>
"%q" 0,2 s
"%a%q" 0,6 s
"%a%a%q" 1,8 s
"%a%a%a%q" 5,6 s
"%a%a%a%a%q" 14,9 s
"%a%a%a%a%a%q" 35,2 s
"%a%a%a%a%a%a%q" 74,7 s
.....
I'll probably take a look at this issue.
--
Nickolay Samofatov mailto:skidder@...
Yahoo! Groups Sponsor
ADVERTISEMENT
To unsubscribe from this group, send an email to:
Firebird-Java-unsubscribe@yahoogroups.com
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.