Subject Re: [IB-Architect] Identifier naming woes
Author David Jencks

I went back to being confused.

On 2001.05.25 09:16:14 -0400 Jim Starkey wrote:
> At 10:18 PM 5/24/01 -0400, David Jencks wrote:
> >>
> >> There is a simple trick that obviates the need for identifiers. The
> >> basic idea is that here is a single (internal) registry for "tokens"
> >> (cannonical strings). The registry is a simple hash table of known
> >> strings. An arbitrary string is passed into the registry. If it
> >> matches a known string, the address of the string is returned,
> otherwise
> >> the new string is copied into the registry and its address returned.
> >>
> >> All strings associated with database objects (from the parser, meta-
> >> data handlers, etc) are sanitized through the string registry. Two
> >> "tokens", therefore, can be correctly compared for equality by
> >> comparing their addresses. A hash table of object names can be
> >> maintained by taking the address of the tokens mod table size.
> >>
> >
> >
> >At first I hated this idea, and when trying to explain why I realized
> how
> >good it is. My remaining question is, what happens when the hash table
> >needs to expand (e.g. using extensible linear hashing)? Not that my
> >knowledge of hashing is stellar, but I don't know of any scheme that
> >expands the hash size without moving some items. Wouldn't this require
> >updating all the moved token values? I suppose you could keep
> references
> >to each location the token is used.... but this is starting to sound
> >complicated.
> >
> The smart thing to do is allocate a large buffer(s) to hold the
> string, so the strings wouldn't need to be moved (which, of course,
> would be impossible anyway). This is important for other reasons,
> as well. Memory allocations are surprisingly expensive. Not
> only is there the internal structure of the heap, but also the
> need to interlock the heap against thread switches. Managing
> the string memory as part of the token registry is simple and
> virtually free. (The very clever hacker will note that a
> hash table can be safely "read" without a lock, which is a
> big plus.)

How do you know how large is large enough? If I start with an empty db and
proceed to create 2 billion empty tables, what happens?

> I've never been a fan of fancy hashing schemes. There are two
> cases. First, the access cost to a node is cheap (e.g. in memory),
> so who cares?
cares about what?
> Second, the access cost is high (e.g. on disk),
> in which case hash always loses (this is a fun argument).
loses to what?

I'm not sure what you're getting at. The only purpose I know of for things
like extensible linear hashing is to solve the "how large is large enough"
question. Is there another way? As I said, I'm not an expert.
> >Also, this might make the system tables kind of unreadable, unless you
> also
> >tokenized all the string values in data. I did read about some project
> in
> >france in the early 80's that tried this, but I don't know how it worked
> >out.
> >
> No, this is only an in-memory symbol lookup system. No change in the
> ODS, a net reduction in the lines of code.
> >Would this be a major revamping of most of firebird's internals or just
> a
> >minor change?
> >
> Mostly removal of ughly code. It could be done in conjunction
> with the addition of multi-level namespaces.

Now I'm sure I don't understand which operations this would help. Can you
be specific and perhaps point out which files the ugly code is in?

Are multi-level namespaces kind of like SQL schema but better?
(level1.level2.level3.myrealobject or ?)

> Jim Starkey
> To unsubscribe from this group, send an email to:
> Your use of Yahoo! Groups is subject to