Subject Re: [IB-Architect] SQL Scripts
Author Jim Starkey
At 06:42 PM 12/7/01 -0700, Jason Wharton wrote:
>
>The "wound" of situations I have confronted is poor use of network during a
>remote batch load process. The bottom line I would like to see solved is a
>way to initiate a "batch load" from the client to the server and use the
>network efficiently.
>

It is always a mistake to look at only one element of performance.
Performance must always be considered in the context of a complete
system. So lets look at the tradeoffs of a remote restore operation.
The major potential system bottlenecks are these:

1. Server cpu cycles
2. Server disk bandwidth
3. Network throughput
4. Network latency

Consider two alternatives for the restore: a single compount SQL
string with an insert per record, and single prepared "insert"
statement receiving individual data packets.

Disk bandwidth, obviously, is identical.

I think we can agree that plus or minus some overheap for package
accounting, approximately the same number of bytes are transmitted.
The zillion batched "insert" statements are pudgier, containing
keywords, punction, and whitespace, and numbers expressed by strings.
Offsetting this are the packet headers on transmitted data records.
We'll call network throughput a wash.

There is no question that the aggregate network induced latency
for the single batched "insert" statement is much lower. If
elapsed time for the restore were the only consideration, the
single compound "insert" statement would win hands down.

But in most complex systems, the server cpu is the limiting factor
in system performance, particularly since Interbase doesn't work
worth a damn on multi-processors. For each record in the restore
stream, the single batch insert must:

1. Perform a precedence based parse of the input statement,
allocating a block for each production in the parse
tree.
2. Translate the SQL syntax tree into BLR.
3. Parse the BLR.
4. Perform a semantic analysis of the parse tree, resolving
names, evaluating protections, determining data types of
various expressions, etc. Again, a minimum of one block
per node in the statement's semantic tree is required.
5. Perform another optimization pass over the semantically
complete execution tree.
6. Instantiate a statement instance to execute the statement.
7. Execute the statement.
8. Destroy the statement instantiation.
9. Destroy the statement semantic tree.
10. Destroy the BLR
11. Destroy the SQL syntax tree.

The alternative per record operation is:

1. Receive a structured record packet (data converted to
target format on the client, where CPU is plentiful).
2. Execute the statement.

If you compare the relative costs of a SQL statement life cycle
and execution of a prepared statement, you will find more than
a 1000:1 ratio (measurement is left as an exercise for the
reader).

Worse, the trends are against the batched SQL statement. You
can easily and cheaply add network bandwidth to a lan by
replacing a hub with a switch, moving to 10 MB to 100 MB to
Gigabyte ethernet, installing 32 or 64 bit cards, and adding
a second processor to babysit the ethernet while the first
slogs away running database code.

So unless you're running on a dial up line, your approach
loses badly today and even worse tomorrow.

----------------------------------------------------------

That said, the best solution is none of the above. The best
way to do almost all application transactions, including
update and restore, is to execute the application code inside
the database engine, eliminating almost all network related
costs, especially the pesky and annoying network latency.

Application backup and restore in Netfrastructure is implemented
in Java code (with a modest private extention to JDBC) that
runs as part of the embedded base application within the
engine. (Netfrastructure doesn't have a database, it has a
Content Store. There isn't a difference.) The back or
restore data stream can be either a local file or a remote
FTP server.

The "modest" extention to JDBC are NfsResultSet.getRecord()
and NfsPreparedStatement.putRecord(byte[] array). The
former packages all column values into a self-describing,
architecturally tagged byte array. The second call takes
a byte array from the getRecord() and zips it into the
corresponding parameters.


Jim Starkey