Subject Re: [Firebird-Architect] Blob Compress -- Some Numbers
Author Fabricio Araujo
On Mon, 16 May 2005 19:23:41 -0400, Jim Starkey wrote:

>I took a "documents" table from one of my production databases and
>crunched some numbers. The table had 1,377 blob with summarized as follows:
> MIMETYPE COUNT Average Size Average Compressed Size
>----------------------------- ----- ------------ -----------------------
>application/msword 768 122767 88694
>application/octet-stream 420 108876 82103
>application/pdf 153 1048402 896624
>application/vnd.lotus-wordpro 4 41423 17888
>application/ 3 79872 20694
>application/vnd.rn-realmedia 9 3583755 3272342
>application/x-macbinary 1 19968 1702
>image/gif 3 37806 37745
>image/jpeg 1 133523 124465
>image/pjpeg 6 245316 238838
>text/html 8 16459 4528
>text/plain 1 1 9
>The aggregate size of the blobs was 334,948,746. The blobs represent
>whatever the government workers in the city of Amesbury, Massachusetts
>thought was worth sharing. Normal content is managed in Word, which
>explains the heavy skew. The Word documents had a total of about 1200
>images, mostly jpegs.

That explain the very little gain with word docs.
Word docs without compressed images would compress MUCH more.

>I compressed each block with zlib using default settings, writing both
>the original and the compressed versions to a new table. The aggregate
>size of the compressed blobs was 271,077,508 bytes.

What was the octet-stream? Windows executables? If so, normally they
won't compress much, unless they have a lot of resouce stuff (icons,
tons of menus, bmps)

>I started a Netfrastructure server from scratch and fetched all
>uncompressed blobs of the new table. I restarted the server and fetch
>and decompressed all compressed blobs. The elapsed time for the
>uncompressed blobs was about 64 seconds, the elapse times for fetching
>and decompressing the compressed blobs was about 58 seconds.

But this is because of Word docs with JPEGs. If they have only text,
the compressing will be much better. Word docs format is one of mostly
compressable formats around.

>The test step was in Java, but other than memory, the Java overhead was
>insigificant, and in any case, the same for both cases. The Inflater
>implementation was a thin layer on the stardard zlib. The test was run
>on VisualStudio 6 compiled for debug. My code tends not to have a
>dramatic difference between debug and release compilation, but I didn't
>try to measure this.

Heavy processing stuff normally receive a good
improvement when the compiler applies optimization.
I don't know VS but is at least what happens in Delphi.

>The machine was 768MB 1.3GHz Athlon. The machine while decompressing
>was, in the venacular, beat to shit.

This is very bad.

>I'm losing my enthusiasm for compressed blobs. I'm not convinced the
>big win is there.
>Jim Starkey
>Netfrastructure, Inc.
>978 526-1376
>[Non-text portions of this message have been removed]
>Yahoo! Groups Links