Subject Re: [IB-Architect] Space usage with TCP/IP & Varchars
Author Jim Starkey
At 11:26 AM 6/19/00 +0100, Jason Chapman wrote:
>All,
>
>It would appear that when data in VARCHAR's are returned over the network
>that they are fixed width, is this true?
>
>I.e. If I have a VARCHAR(1024) in a table with 'ABC' in one of the rows then
>the TCP/IP packet contains ABC and a whole load of nil's.
>
>I've posted this here as I believe it to be true (i.e. not support) and if
>it is, then it represents a major network traffic problem.
>
>Is it true? Am I missing something?
>

First, the historical perspective. The native InterBase API is
based on BLR -- binary gook -- which contains, among other things,
formal message declarations for data exchange between client and
server. The remote interface/server pick up the message declarations
to figure out what translations are required during transmission.

When a remote interface and server first meet each other they have
a preliminary chat about protocol, etc. The remote interface begins
the conversation by identifying its hardware architecture and lists
the protocol versions that it understands and how much it likes
each one. The server takes ponders this data and picks a protocol
to use. In addition to protocol version, there are two sub-protocols:
heterogenous and homogenous. The hetergenous protocol is used when
the client and server are different architectures and all data must
be sent in network canonical form. The homogenous protocol blasts
the message without translation. The basis for this design (long
pre-internet) was that virtually all networking was on LANs with
DMA network adaptors, that CPU cycles were a precious commodity
not to be squandered lightly, and that because of blobs, VARCHARs
were likely to be relatively short.

Given this architecture and very long VARCHAR declarations compared
with actual use, these design decisions meant that communication
between a Sun and an HP/UX machine was faster than between two
Suns or two HP/UXes.

The world has changed since the original design. CPU cycles are
cheap and networks (worst case) was much slower.

It is probably time to either dump the homogenous protocol or to
augment it with a field by field variation without translation to
avoid network tranmission of unused tail bytes of a VARCHAR.
Happily, the two phase connect protocol allows this to be done
cleanly and invisibly.

The lesson here, to be interpreted broadly, is that almost all
design decisions were based on assumptions concerning the relative
costs of various resources, and when those relationship change,
designs need to be reconsidered. The real test of an architecture
is not just how well it performs, but how well it responds to
change.

Jim Starkey