Subject | Re: [Firebird-Architect] Re: SMP - increasing single-query performance? |
---|---|
Author | Alex Peshkov |
Post date | 2006-11-10T10:07:11Z |
Jim Starkey :
should sooner waste more time syncing between them. But there is one
exception: we can separate fetching data to be sorted and sort itself
with minimum syncs between threads. Certainly it makes sense only for
really big sorts, when we do partial sort and merge results later. It's
possible to start partial sorts in separate threads, as long as we have
enough CPUs in the system. This should give an effect when restoring big
databases. Certainly, in case when crazy user requested a huge amount of
data to be returned in single query, parallel sort can be useful too.
> While the opportunities for parallelism for queries are sparse, thereTrying to run separate parts of query execution in different threads we
> are all sorts of interesting things that I'm doing in Falcon for the
> update side. Updates in work this way:
>
> 1. The client thread performs updates in memory (records and indexes)
> 2. Blobs, however, are written to page images in the cache. When a
> blob is complete, a separate thread, the page writer, starts
> writing it.
> 3. At commit time, the updates (records and indexes) are written to a
> serial log. When the page writer is done writing all blobs, a
> commit record is written to the serial log, and the transaction is
> considered committed.
> 4. Post commit, another thread, the gopher (go fer this, go fer that)
> copies stuff (records and index updates) from the serial log to
> page images.
> 5. Yet another thread, the system scheduler, fires of periodically to
> checkpoint the page cache to the disk.
>
> So data from a single transaction passes through a client thread, the
> page writer, the gopher, and the scheduler. Letting updates run
> unblocked in memory until commit, then waiting only for the basic bits
> to hit the oxide, letting other threads migrate data to disk later leads
> to incredible update performance.
>
> Queries, however, are hard to parallelize. Better to have a large
> record cache so they just go fast.
>
should sooner waste more time syncing between them. But there is one
exception: we can separate fetching data to be sorted and sort itself
with minimum syncs between threads. Certainly it makes sense only for
really big sorts, when we do partial sort and merge results later. It's
possible to start partial sorts in separate threads, as long as we have
enough CPUs in the system. This should give an effect when restoring big
databases. Certainly, in case when crazy user requested a huge amount of
data to be returned in single query, parallel sort can be useful too.