Subject | Re: [Firebird-Architect] Blob Compress -- Some Numbers |
---|---|
Author | Fabricio Araujo |
Post date | 2005-05-17T02:23:06Z |
On Mon, 16 May 2005 19:23:41 -0400, Jim Starkey wrote:
Word docs without compressed images would compress MUCH more.
won't compress much, unless they have a lot of resouce stuff (icons,
tons of menus, bmps)
the compressing will be much better. Word docs format is one of mostly
compressable formats around.
improvement when the compiler applies optimization.
I don't know VS but is at least what happens in Delphi.
>I took a "documents" table from one of my production databases andThat explain the very little gain with word docs.
>crunched some numbers. The table had 1,377 blob with summarized as follows:
>
> MIMETYPE COUNT Average Size Average Compressed Size
>----------------------------- ----- ------------ -----------------------
>
>application/msword 768 122767 88694
>application/octet-stream 420 108876 82103
>application/pdf 153 1048402 896624
>application/vnd.lotus-wordpro 4 41423 17888
>application/vnd.ms-excel 3 79872 20694
>application/vnd.rn-realmedia 9 3583755 3272342
>application/x-macbinary 1 19968 1702
>image/gif 3 37806 37745
>image/jpeg 1 133523 124465
>image/pjpeg 6 245316 238838
>text/html 8 16459 4528
>text/plain 1 1 9
>
>
>
>The aggregate size of the blobs was 334,948,746. The blobs represent
>whatever the government workers in the city of Amesbury, Massachusetts
>thought was worth sharing. Normal content is managed in Word, which
>explains the heavy skew. The Word documents had a total of about 1200
>images, mostly jpegs.
Word docs without compressed images would compress MUCH more.
>What was the octet-stream? Windows executables? If so, normally they
>I compressed each block with zlib using default settings, writing both
>the original and the compressed versions to a new table. The aggregate
>size of the compressed blobs was 271,077,508 bytes.
won't compress much, unless they have a lot of resouce stuff (icons,
tons of menus, bmps)
>But this is because of Word docs with JPEGs. If they have only text,
>I started a Netfrastructure server from scratch and fetched all
>uncompressed blobs of the new table. I restarted the server and fetch
>and decompressed all compressed blobs. The elapsed time for the
>uncompressed blobs was about 64 seconds, the elapse times for fetching
>and decompressing the compressed blobs was about 58 seconds.
the compressing will be much better. Word docs format is one of mostly
compressable formats around.
>The test step was in Java, but other than memory, the Java overhead wasHeavy processing stuff normally receive a good
>insigificant, and in any case, the same for both cases. The Inflater
>implementation was a thin layer on the stardard zlib. The test was run
>on VisualStudio 6 compiled for debug. My code tends not to have a
>dramatic difference between debug and release compilation, but I didn't
>try to measure this.
improvement when the compiler applies optimization.
I don't know VS but is at least what happens in Delphi.
>This is very bad.
>The machine was 768MB 1.3GHz Athlon. The machine while decompressing
>was, in the venacular, beat to shit.
>
>I'm losing my enthusiasm for compressed blobs. I'm not convinced the
>big win is there.
>
>--
>
>Jim Starkey
>Netfrastructure, Inc.
>978 526-1376
>
>
>
>[Non-text portions of this message have been removed]
>
>
>
>
>Yahoo! Groups Links
>
>
>
>
>
>