Subject Re: [IB-Architect] Hoard Memory Allocator
Author Jim Starkey
At 10:41 PM 9/25/02 +0400, Nickolay Samofatov wrote:
>
>You are wrong, at least partially. Let's consider simple case:
>You have 4 heaps A, B, C and D. Even in worst case when memory is
>distributed evenly between threads and heaps if you lock heap D to
>deallocate some memory heaps A, B and C can still be freely used
>for both allocation and deallocation.
>

My concern was not contention but mutex overhead. Whether using
a single heap, a per-processor heap, or a per-head heap a mutex
must be seized to allocate a block and to release a block. Sure,
there is a cost to blocking a thread, but as long as the processor(s)
stay busy, I'm reasonably happy. Not blocking, of course, is
better.

If your argument is that if there are multiple heaps the probability
that a thread will have to wait on a mutex goes down, I agree
completely.

>I developed SMP allocator for my real-time medical database project
>some time ago. I allocated several large heaps and chosed heap
>for allocation randomly. It almost eliminated lock time for my
>program. Hoard test data demonstrate even better results because
>data usually has strong thread affection in OO-programs and it is
>optimized for this case and seems to behave as my allocator in worst
>case.
>

My point was that memory allocated in a database server is generally
shared equally by all threads, pretty much the hoard benefit of
reduced cache contention. If memory is allocated, used, and
released within a single thread, then, sure, the hoard model probably
works like a champ. But that's not the way database servers behave.

Netfrastructure allocates a thread per database connection, which
returns to the thread barn when the connection is closed. This
works just fine because the scope of a Netfrastructure connection
is generally a single verb, genHTML. If I were building a server
for the client server world, I'd allocate a thread on a per verb
basis to minimize the number of threads (each thread requires
a stack that consumes a huge amount of virtual memory). In either
case, however, an active thread spends most of its time walking
internal datastructure and the buffer pool. In its current
form Firebird would benefit since compiled statements are never
reused outside a single attachment, giving pretty good thread
affinity. But if you guys ever recogize the performance win
in compiled statement caching, the gain goes out the window.

One last thought. Since a Pentium class machines maxes out with
less that $1K worth of memory, the scarce resource becomes
virtual address space. My memory allocators (you've got one)
cache "small blocks" and recombine "big blocks". As the number
of heaps go up, the total amount of freed memory goes up, reducing
the amount available. Netfrastructure lets an administrator
control the record storage garbage collection threasholds to
balance memory between working storage and cached records. To
keep the amount of working storage in safe levels in a multi-heap
implementation would necessary reduce the amount of space for
cached records.

Oh, need I mention that a machine maxed out with physical memory
running a single multi-threaded process never page faults?

Jim Starkey