Subject Blob Compress -- Some Numbers
Author Jim Starkey
I took a "documents" table from one of my production databases and
crunched some numbers. The table had 1,377 blob with summarized as follows:

MIMETYPE COUNT Average Size Average Compressed Size
----------------------------- ----- ------------ -----------------------

application/msword 768 122767 88694
application/octet-stream 420 108876 82103
application/pdf 153 1048402 896624
application/vnd.lotus-wordpro 4 41423 17888
application/vnd.ms-excel 3 79872 20694
application/vnd.rn-realmedia 9 3583755 3272342
application/x-macbinary 1 19968 1702
image/gif 3 37806 37745
image/jpeg 1 133523 124465
image/pjpeg 6 245316 238838
text/html 8 16459 4528
text/plain 1 1 9



The aggregate size of the blobs was 334,948,746. The blobs represent
whatever the government workers in the city of Amesbury, Massachusetts
thought was worth sharing. Normal content is managed in Word, which
explains the heavy skew. The Word documents had a total of about 1200
images, mostly jpegs.

I compressed each block with zlib using default settings, writing both
the original and the compressed versions to a new table. The aggregate
size of the compressed blobs was 271,077,508 bytes.

I started a Netfrastructure server from scratch and fetched all
uncompressed blobs of the new table. I restarted the server and fetch
and decompressed all compressed blobs. The elapsed time for the
uncompressed blobs was about 64 seconds, the elapse times for fetching
and decompressing the compressed blobs was about 58 seconds.

The test step was in Java, but other than memory, the Java overhead was
insigificant, and in any case, the same for both cases. The Inflater
implementation was a thin layer on the stardard zlib. The test was run
on VisualStudio 6 compiled for debug. My code tends not to have a
dramatic difference between debug and release compilation, but I didn't
try to measure this.

The machine was 768MB 1.3GHz Athlon. The machine while decompressing
was, in the venacular, beat to shit.

I'm losing my enthusiasm for compressed blobs. I'm not convinced the
big win is there.

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376



[Non-text portions of this message have been removed]