Subject RE: [firebird-support] searching a varchar(100) column
Author David Johnson
On Sun, 2005-05-01 at 10:18 +1000, Helen Borrie wrote:
> At 09:11 AM 30/04/2005 +1000, Alan McDonald wrote:
> > > The problem is that "like" cannot use an index, so everything becomes a
> > > table space scan.
> > >
> > > I don't know how this will work - It is similar to a couple of
> > > suggestions I have already seen in response to your question.
> > >
> > > 1. make the primary key of your book table something other than the
> > > title. ISBN is a good choice. I'll just call this <<primary key>>
> > >
> > > 2. create a separate keyword to book relationship table, that is
> > > structured as keyword, <<primary key>>. I would be tempted to make its
> > > primary key both of its columns.
> >
> >too much disk space/DB pages used for the index. you're always better off
> >here with a surrogate integer key as PK. Comments?
>
> I always prefer an atomic primary key. Neither book title nor ISBN is
> atomic - both come from external sources and are subject to variation and
> human error.


>
> Book title is definitely a bad choice, thanks to the way books are
> listed. If the title is listed in one source as "Firebird Book, The" and
> in another "The Firebird Book", what rule are you going to use?

Most search engines are coded to ignore "The", "A", and semantically
similar constructs, or at least to give them a low weighting. In RDBMS
terms, this amounts to rejecting the title as a primary key.

>
> ISBN is also variable and subject to change. On the book, you see
> "1-59059-279-4". Do you key on this string and hope everyone gets the
> hyphens in the right positions? or do you eliminate the hyphens (as one
> could do by applying a fairly complicated trigger to inserts and updates)
> and hope that searchers understand that you don't store hyphens in ISBNs?

Searchers do not generally have access to the SQL. The front end
application can strip out hyphens easily enough, and replace them on
display, even if you write it in assembly. In Object Pascal (Delphi) or
C it is straightforward. In Java, Perl, Python, and other languages it
is trivial - something along the lines of keyval = isbn.replaceall
("-","")


> The ISBN of a book will change if another publisher takes it up, as occurs
> quite frequently with technical books. The original publisher goes broke
> and reprints are issued by a new publisher, without the book itself
> changing at all. Not all publications have ISBNs, either.

Good point.

Furthermore, ISBN is specific to the format - three copies of the same
book, one in paperback, one in trade edition, and one in hard cover,
will each have different ISBN's. And if you get a volume that contains
three distinct titles in the same cover, that represents yet another
variation. ISBN makes sense in a book store's inventory tracking
system, which must be sensitive to specific print runs, But it would
not be at all good for general searching for a title.

>
> On the size issue, an ISBN without the hyphens is 9 bytes, which doesn't
> make it a big loser against a BigInt (8 bytes).

The 9 _digit_ ISBN can be converted to a BigInt before storage. Strip
out the hyphens and it is simply a piece of numeric data. I would still
make it a BigInt since INT only gets you into the 4xx,xxx,xxx range in 9
digit numbers.