Subject Re: [IB-Architect] Identifier naming woes
Author David Jencks
Hi,
On 2001.05.24 19:38:03 -0400 Jim Starkey wrote:
> At 04:15 PM 5/24/01 -0700, Jason Wharton wrote:
> >
> >I was proposing a way to efficiently allow column names to be longer
> than 32
> >characters. If we allowed them to be up to 256 that would be a nice jump
> but
> >it would bloat the system significantly since everything is so tied to
> the
> >column name instead of just a simple key that stands in for it. It would
> >also bloat the amount of information that crosses the wire when
> preparing
> >statements. I'm not in favor of bloating things and I'm not in favor of
> >living within narrow constraints either. Something needs to give...
> That's
> >when the cost benefit of a layer of indirection is measured. That's all
> I'm
> >suggesting be done is to measure the costs.
> >
>
> There is a simple trick that obviates the need for identifiers. The
> basic idea is that here is a single (internal) registry for "tokens"
> (cannonical strings). The registry is a simple hash table of known
> strings. An arbitrary string is passed into the registry. If it
> matches a known string, the address of the string is returned, otherwise
> the new string is copied into the registry and its address returned.
>
> All strings associated with database objects (from the parser, meta-
> data handlers, etc) are sanitized through the string registry. Two
> "tokens", therefore, can be correctly compared for equality by
> comparing their addresses. A hash table of object names can be
> maintained by taking the address of the tokens mod table size.
>
> The string registry has a huge number of benefits:
>
> 1. Each string is represented at most once.
> 2. Equality comparison are a simple longword compare.
> 3. Case tweaking is done in one place.
>

(3 is not a huge number. Maybe you mean a number of huge benefits ;-)

> Ditch the ids and go with tokens. Less generated code, faster
> execution, less memory used (not that this matters anymore).
>
> I don't think a handful of bytes on a 100 MB ethernet wire is
> worth worrying about.
>

At first I hated this idea, and when trying to explain why I realized how
good it is. My remaining question is, what happens when the hash table
needs to expand (e.g. using extensible linear hashing)? Not that my
knowledge of hashing is stellar, but I don't know of any scheme that
expands the hash size without moving some items. Wouldn't this require
updating all the moved token values? I suppose you could keep references
to each location the token is used.... but this is starting to sound
complicated.

Also, this might make the system tables kind of unreadable, unless you also
tokenized all the string values in data. I did read about some project in
france in the early 80's that tried this, but I don't know how it worked
out. I suppose you could keep all the current columns and add tokenized
versions that are actually the ones used.

Would this be a major revamping of most of firebird's internals or just a
minor change?

david jencks


<snip>