Subject Re: [Firebird-Architect] Re: Record Encoding
Author Jim Starkey
Roman Rokytskyy wrote:

>I'm not Dmitry, but it seems that I have an opposing opinion. I want
>to have a seek operation defined for BLOBs. This works very well for a
>big BLOBs with a record-like structure, where only small parts of that
>stream are needed on the client. So, I need uncompressed BLOB, since
>speed of accessing that small blocks is more important for me than space.
>
>
OK. Let's go there. Rdb had only segmented blobs. At Interbase, I
added stream blobs with seek. In Netfrastructure, I dropped both
segmented blobs and seek.

My initial thinking on blobs was centered around the idea that blobs
were going to be much larger than available memory. That, like the
PDP-11, has passed. It is now feasible, even preferable, to fetch the
whole thing into memory and process it there.

I'm not going to argue that there aren't applications where UDFs or blob
filters or embedded Java or stored procedures aren't going to want to
manipulate blobs. I am going argue that that capability shouldn't
dictate the on disk storage format.

We can continue to support blob seek with the simple expedient of
fetching and decompressing the blob into memory and doing random
access. The other 99.99% of the cases can get full advantage from blob
compression.

>Most likely we need another approach to BLOBs at all. Currently it is
>only byte stream, BLOB types are mainly needed for filters. However it
>seems that we need a full type support for TEXT type (should include
>character set in declaration or store text in unicode), COMPRESSED
>type (actually a BLOB, but stored in compressed form, probably with an
>attribute saying the type of compression - zip, gzip, bzip2, etc.),
>BINARY type (just a byte stream). There is no need to convert TEXT to
>COMPRESSED or COMPRESSED into BINARY, those are independent types with
>their own attributes. Something similar to the ARRAY data type.
>
>
>
I don't think so. I added blob types to support this sort of thing and
it almost never got used, and never got used the way I expected. I
think JDBC called it about right -- blobs and clobs, where the only
difference is that clobs are subject to character set conversion. Blob
compression should be like record compression, which is to say
transparent, but end to end efficiency suggests we make both compressed
and decompressed access.

By the way, to the best of my knowledge, the actual encoding in zip,
gzip, and bzip are all RFC 1951. The differences are the file
wrappers. If somebody knows otherwise, I'd like hear about it.

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376