firebird-architect - Re: [firebird] Re: SV: [Firebird-Architect] Indexes for big objects

Subject	Re: [firebird] Re: SV: [Firebird-Architect] Indexes for big objects
Author	Jim Starkey
Post date	2006-07-09T21:19:59Z

Pavel Cisar wrote:

> Andrew Berg wrote:
> > Oh, please. That is like saying "I can't see why it would be good to
> > provide stored procedures or integrity constraints when you can
> > easily solve that at the 'application' level."
>
> That's completely different matter. Stored procedures and integrity
> constraints are valuable to 99.9% applications, so even doubling the
> code base to provide it is a good deal.
>
> While I agree that index on very long fields could be needed sometimes,
> it's definitely far from usability in at least 50% applications (I would
> be glad to be proved wrong).
>
> Look, you see the problem from application developer point of view.
> *You* need the feature, so you're all for it, but FB developers must
> consider whether it's a good deal. Introducing features means more code
> to maintain, more complexity and execution pathways introduced that
> translates to bloat, potential for tough bugs and potentially problems
> introducing other new features into now more complex system down the
> road. If it would solve common problem (like SP's), it's good deal
> anyway, but to solve problem that only 1% users would have and for what
> easy workaround exists, it's a very bad deal. And it goes right against
> basic rules of *Firebird* development, which goal is to develop slick,
> fast, simple (in maintenance and usage) RDBMS. It's how Firebird
> differentiate itself from other engines.
>
> Anyway, I think that this particular feature we discus now is badly
> engineered. It supports only exact matches, which means that one has to
> send the *whole* long value over the wire, which is IMHO very bad idea.
> It's really usable only in PSQL, where the value manipulation happens
> only on the server, but manipulating long values would be problematic
> anyway (more memory consumption for value buffers, some routines must be
> checked and adjusted to work with potentially unlimited length values
> etc.). We have to check whether it will not cause problems in current
> code base to implement this.
>
> To consider it as a good proposal, it has to:
>
> 1. Solve the memory and over-the-wire efficiency.
> 2. Prove that at least uniqueness is really guaranteed (not all hashes
> would work here).
> 3. Should also offer partial matches (at least for START WITH).
> 4. Why stop only on very long fields? Good solution for indexing 32k
> fields should also work for BLOBs, thought.
>
> We can also consider to solve the problem (like you described) by other
> means than by support for long-key indices (or only by index). We can
> introduce new SQL constructs, predicates, system functions, etc.
>
> best regards
> Pavel Cisar
> IBPhoenix
>
> _

> .
>

I'm less than convinced of the utility. Yes, it is certainly quite
useful to know what something rather large like a Word document, PDF, or
jpeg is a duplicate, but I don't think that is enough -- you want to
know what it is a duplicate of. I have trouble thinking of an
application where it is sufficient to say, "nope, won't do that", rather
a) linking to the previously stored version, or b) telling the use that
that object already exists and giving him a pointer to it.

Computing a hash and populating a hash field is easily done by trigger.
So is checking for duplicates. But in the final analysis, most
application will want to handle duplicates with context specific
algorithms, so I don't see much value in building the feature into Firebird.

>

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376