Subject RE: [firebird-support] Restore error during unique index creation
Author Leyne, Sean
Thomas,

> The problem with gbak is, that especially in the restore process, gbak
> becomes CPU bound at the index creation stage very quickly. CPU bound on
> a single physical core. gbak can't utilize a SMP environment at the moment.

The limitation doesn't lie with gbak but rather the internal read/sort/merge functions of the engine itself, which affects all SELECT with ORDER BY statements.

The reality is that index build operations perform a very basic SELECT RDB$DB_Key, (Field[s] for index) FROM Table ORDER BY 2, for each index of a table. So, each index is built using a basic SELECT statement.

The engine has a rather simple loop for handling most all SELECT statements, which involves reading the rows into a temp memory block. If the block gets too big, the loop then invokes a function to sorts the block, write the block to disk and also merges temporary blocks.

All of these steps are executed/handled by a single 'threaded' process.

It has been my hope to convince one of the FB developers to break up these operations such that the reading of row into the temporary block, would be separated from the sorting of the block and the merging of blocks.

Such, that it could be possible to significantly improve the performance of SELECT w/SORT operations. (This would result in higher CPU load for FB servers, but it is my experience that most database servers are disk IO constrained, not CPU, so the impact of the would not be a significant as one might think)

Another benefit of separating the steps/operations in separate threads could be the ability to use file compression on the temporary blocks, which would resolve another known FB limitation -- SELECT which generate results larger than the size of temporary block need free disk space for the complete result (in raw bytes). Compression would reduce the amount of free disk space required (could also make the use of RAM disks more feasible). Without multiple threads, the "cost" of the compression would too significant for SELECT/SORT operations.


> If you have a look on a thread in firebird-devel started on Sept. 10, 2012,
> where I was mentioning InterBase XE3 got improved by parallel index
> creation at restore. Sounds like a good move in InterBase XE3, although I
> haven't tried that enhancement.

I haven't tried XE3 myself, either, but my reading of the XE3 release notes suggests that they didn't attack the basic SELECT/Sort 'problem' above.

Rather they designed the index rebuild operations to read all of the data required for multiple indexes in a single pass and then re-use the temporary block(s) and perform sorts for the indexes using the existing temporary blocks/data rather than having to re-read the entire table again. The benefit is significant reduction in disk IO as well as engine and OS page cache 'exhaustion'.


Sean