Subject RE: [IB-Architect] Next ODS change -compression and performance
Author David Berg
The performance test was 1000 pages of 20,001 bytes each (the extra 1 was me
not paying enough attention). It was run on my notebook (Pentium 600 with
256MB RAM, NT 4.0, Delphi 5, code optimization on) with lots of other
processes up (but hopefully not active).

I wasn't trying to be very scientific, just trying to get a rough idea of
order of magnitude. I also discovered the optimizer was over optimizing the
test and removing some of the checksuming code. So, a little bit fairer
test follows (1000 pages of 40,960 bytes):

38.8310sec Write Memory Block - 1000 times
2.6990sec Read Memory Block - 1000 times
2.0180sec Checksum Memory Block (pointer) - 1000 times
1.2910sec Checksum Memory Block (counter) - 1000 times
0.2460sec Checksum Memory Integer (counter) - 1000 times

The new block size does a better job of aligning reads and writes, the
Integer based test shows performance for handling 8 bytes at once instead on
one.

By the way, here's the same run without optimization:

35.8740sec Write Memory Block - 1000 times
8.0240sec Read Memory Block - 1000 times
3.1290sec Checksum Memory Block (pointer) - 1000 times
2.8240sec Checksum Memory Block (counter) - 1000 times
0.8020sec Checksum Memory Integer (counter) - 1000 times

Clearly having a good optimizing compiler is critical for this kind of
stuff. I also did a little testing with some assembly routines, and think
that you can take the best time and cut it in half again by using hand
optimized assembly code...


Which brings us back to the main point of how much additional processing is
warranted to get how much compression?

I don't intend to answer the question, I just wanted to throw out some ideas
and get people thinking about it.

However, as a rule of thumb, I'd say that if compression, on average, can
cut the number of reads and writes in half, then it's worth a performance
hit of up to 25% of the time reading and writing takes (anything less than
that, is not worth all the extra code to do compression, in my opinion...).



-----Original Message-----
From: Leyne, Sean [mailto:InterbaseArchitecture@...]
Sent: Tuesday, November 07, 2000 5:45 PM
To: 'IB-Architect@egroups.com'
Subject: RE: [IB-Architect] Next ODS change (was: System table change)


David,

An interesting post, one question though.

I'm not clear, are you saying that to you calculated a page size 1000
times (if so, what page size) or calculated the checksum, once, for a
page of 1000 characters?

If expect that it's the first case, but just want to be sure.

(Also, what speed box did you run this on? - just for purposes of
comparison)


Sean


-----Original Message-----
From: David Berg [mailto:DaveBerg@...]
Sent: Tuesday, November 07, 2000 7:38 PM
To: IB-Architect@egroups.com
Subject: RE: [IB-Architect] Next ODS change (was: System table
change)

That looks pretty simple and fast (although I wonder if unrolling the
loop
is defeating the optimizer's attempt to use LOOP and REP type code, no
matter, it's not going to make that big a difference - bigger
differences
would be making sure checksum and p are compiled as registers, and
making
sure there's no integer overflow handling code turned on).

One small problem with the algorythm is that it's marking the page as
dirty,
but that's only an issue if you're using memory mapped files, which I
assume
you're not (I've tried memory mapped files, and for my test cases they
were
always slower than managing my own IO buffers).

If something that simple is chewing up that much CPU, then that doesn't
bode
well for compression schemes. In fact, you got me curious, so I did
some
benchmarking. The following tests were run with optimization ON.
Reading
was done after writing, so the data was probably already in memory:

0.3490sec Checksum Memory Block (counter) - 1000 times
0.6820sec Checksum Memory Block (pointer) - 1000 times
0.9020sec Read Memory Block, Pointer - 1000 times
20.4130sec Write Memory Block - 1000 times

Note that I happen to know the compiler does a better job of optimizing
the
counter/index approach over the pointer approach, which is why I did
both
methods. I did not unroll the loop, and I used byte based pointers
instead
of word or long word based pointers (which would have been faster, but I
was
trying to stack the deck towards slow).


-----Original Message-----
From: Ann Harrison [mailto:harrison@...]
Sent: Tuesday, November 07, 2000 4:25 PM
To: IB-Architect@egroups.com
Subject: RE: [IB-Architect] Next ODS change (was: System table change)





>* 10% CPU time for checksums! Wow. I'd suggest a simpler checksum
>algorythm <grin>, but your idea of getting rid of them entirely is
probably
>better yet.

For the record, I had nothing to do with identifying the problem
or recommending a solution. The algorithm is pretty simple:
....
old_checksum = page->pag_checksum;
page->pag_checksum = 0;
p = (ULONG*) page;
checksum = 0;

do
{
checksum += *p++;
checksum += *p++;
checksum += *p++;
checksum += *p++;
checksum += *p++;
checksum += *p++;
checksum += *p++;
checksum += *p++;
}
while (p < end);

page->pag_checksum = old_checksum;
....


Regards,

Ann


To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com




To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com



To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com