Subject Re: [firebird-support] FB Embedded: Table Field encryption questions
Author Geoff Worboys
Hi Chuck,

> So, the bottom line is pretty much your last comment:

> Basically, bring in the encrypted data and decrypt within
> the client app, which is what I've done for the previous app.

Yes.


> Then you state:

>> eg: Implement something like a full text search engine over
>> your secret data (where the index may be encrypted or not,
>> exist on the database or not). Appropriately implemented it
>> may allow you to map search requests into specific (primary key
>> based) row requests and so boost performance. A fair bit of of
>> work I suspect, but it depends how important the performance
>> issues are likely to be. (And if full-text search is already
>> a desirable feature then it may indeed be worth while.)

> I'm afraid I'm not familiar with this idea at all.

The idea was inspired by this article:
http://www.ibphoenix.com/main.nfs?a=ibphoenix&s=1113078073:347031&page=ibp_dotlucene

The idea behind full text search systems to allow easy and
efficient searching of text where ever it may exist in the
"source". You see similar systems for searching your harddisk,
an index is created that covers all the files of interest. In
the case of databases an index is created that covers all the
fields and tables of interest.

IBObjects has such an implementation:
http://www.ibobjects.com/ibofts.html
But I doubt whether you could use it (or at least not directly)
due to the other factors already discussed that require all the
data to come back to the application encrypted.


[...]
> What do you mean by "implement a full text search over my
> secret data?" What does this entail? Can you flush out your
> thoughts here a little more so I can get the idea?
[...]

I've never really studied full text search to see how it should
be done. But following is a basic outline of how I imagine you
would do it.

- Bring each encrypted record back to your application and
temporarily decrypt it (including any blobs).

- Isolate the words of interest from each field and add those
fields to a special index. For example:

records PK AFIELD BFIELD
1 "Big Thing" "Very secret"
2 "Large Thing" "Very confidential"
fts index
"BIG", [1]
"CONFIDENTIAL, [2]
"LARGE", [2]
"SECRET", [1]
"THING", [1, 2]
"VERY", [1, 2]

Note that the "fts index" listed above is not a normal FB index
on the source table. It is a separate special object created
specifically to support fts word lookup. More below.

Later on a user says they want to find "big" so you look up the
index and find that it occurs in record 1, so you only need to
bring back and decrypt that record.

Which then leaves questions about the index. Where should it
exist, and does it need to be encrypted? The answers will
depend on a lot of different factors.

In the example above the index really wont tell you much about
what is actually in the records even when left as clear text.
So it would not really need to be encrypted. And even if there
are "secret words", it is possible that your indexing system
will not be as clear and obvious as the index above.

That is; Possibly the index structure itself will be enough
obscurity, or possibly you might use soundex or other phonetic
algorithms (or perhaps even a hash system) in which case the
data in the index will be even more obscure.

As to whether the index should be in the database. You could,
for example, create a table for the index data and then define
a FB index on that table such as this really basic (and
probably wrong) example:
CREATE TABLE SECRET_LOOKUP (
WORD VARCHAR(30) PRIMARY KEY,
SECRET_TABLE VARCHAR(31),
LOOKUP_PK INTEGER );

(NB. The use of the "SECRET_TABLE" field can allow you to use
this one table to support FTS indexing over multiple tables.)
You cannot use triggers to maintain your index (as the IBO FTS
example does), because then you run into the various UDF
problems we have already discussed. So the index would need
to be maintained by the client app as rows are changed - not
ideal, but then then encrypted fields cannot be updated (with
any validity) except from your client app.

OR perhaps you have found some code you can plug in that will
normally create an fts index in a file, that file could instead
be written to a blob field and pulled back to the app when it
starts up (depending on how big it is). Given that you are
using embedded, you could also keep the index as a separate
file and access it directly.

There are lots of unanswered questions here: like whether the
index should also say what field (not just what row) the word
occurred in. Your indexing algorithm will need exclusion and
possibly phrase inclusion lists and synonym lists (see the IBO
FTS stuff noted above). Appropriate synchronisation and index
rebuild mechanisms will have to be implemented

But hopefully the above will give you the general idea.

--
Geoff Worboys
Telesis Computing