Subject | Re: Record Encoding |
---|---|
Author | Roman Rokytskyy |
Post date | 2005-05-12T22:36:31Z |
> You're a Java guy with builtin zip support. Why don't you tryJim, as I wrote, I do not care in this case about the CPU cycles, I do
> inflating and deflating your favorite 20,000 blobs and get a handle
> on the costs of of compression and decompression and some idea of
> the efficiency.
care about page fetches. The tricky thing is that I usually know on
the client which "page" is needed - I know offset in the stream where
to start reading.
> Historical note: I was working for DEC's disk engineering groupSure, if it turns out that fetching my blob into memory, decompressing
> while productizing the first JRD. They sold disks and didn't want
> to hear anything about compression. I had to sell them compression
> as a performance feature. Worked them. Maybe it will work with
> Roman.
it there, and finally seek is faster than accessing the page directly...
> I suspect you're wrong. They probably use a 32 bit object space,Do you mean "permanent generation heap" where classes are kept? I do
> but the objects themselves are outside the object space.
not care about it since it affects only the number of classes that can
be loaded in VM, though I do care about the "normal" heap and it is
not possible to specify -Xmx1500m to Sun JVM on Linux. That was on
32-bit machine (2-CPU box with 4 GB RAM, also a 32-bit zLinux machine).
> In any case, you can always spend an extra $49.95 and buy a 64 bitIf that solves the problem - yes. I have no experience with x86 64-bit
> machine.
machines, only Solaris on ES4000. And that is no longer $49.95...
> >Fetch 2 MB blob into memory in order to get 4k block from it? NotThis structure is an index "segment" - one is full-text, another is
> >very efficient, isn't it?
> >
> If one blob fetch out of 100,000 is for an interior segment, yes it
> is. Almost nobody does that sort of thing for a number of reasons,
> the most important of which is that it performs like a pig across a
> wire.
spatial. Code jumps from one "record" to another in a fashion that can
be considered random. Funny enough, when I tried the scheme with ~4k
VARCHARs in each record and accessing it by PK, it performed approx.
4-5 times worser than a big BLOB with seek (page size was 4k).
> If you can't tell the difference, why should you care?Sure, if there is no difference, or the difference is less than 30%, I
do not care. Ok, so far the only conclusion we can do now is that
performance test is needed. I will try to create a standalone test
case, then you can experiment with compression.
Roman