Subject INTL plugins
Author Samofatov, Nickolay
Hi, All!

I just had an on-line meeting with Adriano where we discussed problems
with INTL plugins.
The workable solution to identified troubles seems to be creation of
manifest files for each INTL plugin. If user needs to disable some of
the charsets/collations for a plugin (probably to resolve conflicts) he
can comment out portions of plugin manifest files. By default engine
brings in everything listed in all INTL plugins discovered according to
config files.

Here is the entire meeting transcript (letter is HTML to have transcript
readable):
========================================================================


Adriano (1:04 PM) :

Hi Nickolay!

skidder (1:05 PM) :

Hi, Adriano!

skidder (1:05 PM) :

You raised interesting issue regarding CREATE DATABASE

Adriano (1:05 PM) :

This is what I already told about the "system charset"

skidder (1:06 PM) :

actually, CREATE DATABASE is supposed to be handled on client side, but
for isc_create_database issue is present too

Adriano (1:07 PM) :

But if server can load a charset with only the charset name, problem can
be fixed, right?

skidder (1:08 PM) :

but this raises sharp issue regarding charset name ambiguity

skidder (1:09 PM) :

imagine 2 plugins implementing the same charset

Adriano (1:09 PM) :

Can be resolved in the configuration file too

skidder (1:09 PM) :

this is not nice

skidder (1:09 PM) :

relying on config files too much is bad

skidder (1:10 PM) :

things are supposed to work by default :-)

Adriano (1:10 PM) :

And user are supposed to not difficult things ;-)

skidder (1:11 PM) :

yup :-)

skidder (1:13 PM) :

I think to handle the problem we need to ask user to specify charset
module name if he wants to create database with non-standard charset

skidder (1:14 PM) :

current alternative is to create DB using standard charset and then
change default charset for database to non-standard one

Adriano (1:16 PM) :

I do not understand your last phrase

skidder (1:17 PM) :

let me explain, how Dave Schnepper recommends to use non-standard
charsets now:

skidder (1:17 PM) :

1) create DB using standard default charset

2) register new charsets

3) change DB charset to non-standard one

Adriano (1:18 PM) :

And the filename?

skidder (1:19 PM) :

when you create database at step 1 you are using standard charset which
may be transcoded to system charset by the server

Adriano (1:21 PM) :

I don't like this.

If charsets is registered at the configuration file, they can be
registered at database creation.

Adriano (1:21 PM) :

And allow user to connect with all charsets.

skidder (1:22 PM) :

1) agree, we need to work out better solution

2) disagree, custom charsets should not stick to database silently

3) maybe

Adriano (1:22 PM) :

I can agree with your (2) :-)

skidder (1:25 PM) :

we may allow using "CHARSET@MODULE" as connection and database charset

Adriano (1:26 PM) :

Client need to know about server things is not nice.

skidder (1:27 PM) :

for the brief moment of database creation client *has* to know things
about server

skidder (1:28 PM) :

and after database is created problem disappears

skidder (1:29 PM) :

what do you think?

Adriano (1:31 PM) :

Imagine the situation:

Server 1 has a plugin named "fbintl2.dll"

Server 2 has *another* plugin named "fbintl2.dll"

User want to move database from server 1 to 2.

It can't do this with your proposal.

skidder (1:32 PM) :

this should be discouraged to name plugins as fbintl2.dll

skidder (1:32 PM) :

plugin names need to be unique, or at least reflect their contents

Adriano (1:33 PM) :

Charsets/collations names generally will be unique too.

skidder (1:34 PM) :

You see, to be convenient system needs to be installable via file copy
operations.

Need to move parts of configs around complicates life alot

Adriano (1:35 PM) :

We can search for all plugins and don't allow ambiguities.

We can allow $(ROOT)/intl/* in the configuation file.

skidder (1:35 PM) :

you see, charset/collation names are standartized by ICANN, ISO and
other organizations.

if package implements a collection of charsets, another package is very
likely to intersect with first

skidder (1:38 PM) :

The real problem with Firebird and Vulcan config files is that they are
assumed to be hand-edited.

Installers and users are going to have difficulties with that

skidder (1:39 PM) :

Firebird 1.5 works quite well with all defaults, and you can do
everything with database without ever touching any config file.

This is good

Adriano (1:41 PM) :

This can be done by default in the config file - load all plugins of
intl directory. This is not a problem.

If someone change is because he has some reason for change.

skidder (1:43 PM) :

this is also not nice

skidder (1:44 PM) :

problem is that you may lose track where charset or collation comes from

skidder (1:47 PM) :

I'm looking now how oracle and other engine handle the problem

skidder (1:47 PM) :

all engines allow pluggable charsets/collations :-)

Adriano (1:47 PM) :

And how they implement?

skidder (1:48 PM) :

it appears that for Oracle you need to create charset plugin, one
charset per plugin

skidder (1:49 PM) :

you need to register plugin at Oracle and confirm that its name is
unique

skidder (1:50 PM) :

then you need to copy charset definition (.nlb) to NLS directory

Adriano (1:52 PM) :

I think fbintl.dll don't need to be a special thing.

If we go in your way (@module) we need to treat fbintl as default.

This is not nice IMO.

skidder (1:53 PM) :

this is ok, this is no problem. give me a couple minutes to look at
Oracle solution in more detail

skidder (1:56 PM) :

they have kind of "boot files" which specify which NLS objects to load

Adriano (1:57 PM) :

"boot files" is like config files?

skidder (1:57 PM) :

more like plugin

skidder (1:58 PM) :

look here:

http://www.csee.umbc.edu/help/oracle8/server.815/a67789/appb.htm#948177

skidder (1:59 PM) :

this is old doco, fresh Oracle allows to have pluggable collations if I
remember correctly

skidder (2:06 PM) :

I looked at Oracle 9 live install, they have user-defined collations

Adriano (2:07 PM) :

It's more like the config way. It don't store module names in database,
if I understand correct.

skidder (2:08 PM) :

yup, it looks so

skidder (2:08 PM) :

but it doesn't have configs either

skidder (2:09 PM) :

you put a set of *.nlb files to NLS folder and get happiness

skidder (2:13 PM) :

btw, another interesting place to look is to see what Oracle/RDB stores
in RDB$COLLATIONS and how :-)

skidder (2:15 PM) :

FYI, Oracle/RDB looks much like Interbase/Firebird and has pluggable
collations and support for standard syntax there

Adriano (2:16 PM) :

Oracle/RDB is a old version or exist more updated version?

skidder (2:16 PM) :

Oracle/RDB is Jims first creation, before Interbase

skidder (2:16 PM) :

it is maintained by Oracle

skidder (2:16 PM) :

fresh versions of it exist

Adriano (2:19 PM) :

I think we need to explain the two approaches, how they resolve the
problem in question, in firebird-architect.

For people do not say again about the "politburo" and listen for the
others opinions.

skidder (2:20 PM) :

what do you mean "politburo"?

Adriano (2:21 PM) :

Claudio say this in fb-admin.

I think is this: http://en.wikipedia.org/wiki/Politburo

skidder (2:22 PM) :

you see, mailing lists are very bad media

skidder (2:22 PM) :

you say one thing, but meant actually slightly other

skidder (2:22 PM) :

this takes days and hours to resolve

skidder (2:23 PM) :

at the same time issue in question is forgotten

skidder (2:24 PM) :

both approaches we have now are half-baked and flawed, we need to find
out something clean and workable and present for public review

Adriano (2:25 PM) :

1) I see, no thing has a conclusion in firebird-lists. But the people
will know after implemented and will question why the approach as
chosen.

2) Let's continue to think.

Adriano (2:26 PM) :

And what about the descriptor length?

skidder (2:27 PM) :

I'm afraid that desc length is going to be another descussion, while we
didn't finish plugins issue

skidder (2:27 PM) :

what about continue brainstorming?

Adriano (2:28 PM) :

I don't have another solution at the moment. Do you have?

skidder (2:29 PM) :

kind of :-)

skidder (2:29 PM) :

you see, collations are simpler and may be defined however we want them
to

skidder (2:29 PM) :

problem is with charsets

Adriano (2:31 PM) :

1) but appear we not agread for the solution of ambiguities

skidder (2:31 PM) :

for collations?

Adriano (2:32 PM) :

Yes. More than one plugin can have a collation with the same name.

skidder (2:32 PM) :

collations may be registered in database without difficulties :-)

Adriano (2:33 PM) :

But if we don't have RDB$MODULE_NAME in RDB$COLLATIONS and more than one
driver implement a collation with the same name.

skidder (2:33 PM) :

if we have RDB$MODULE_NAME we have no problems in this case :-)

skidder (2:35 PM) :

problem for charsets is that they need to be used before any db is
present

Adriano (2:35 PM) :

Because this I say we don't agree. I think ambiguities should not be
resolved with RDB$MODULE_NAME

skidder (2:36 PM) :

wait a sec, we have no technical difficulties with collations

skidder (2:36 PM) :

another alternative, at least partial would be to use UTF8 in CREATE
DATABASE

Adriano (2:37 PM) :

This is not good.

skidder (2:37 PM) :

yup. INTL on client side, etc

Adriano (2:37 PM) :

Yes, with INTL in client this is good.

skidder (2:38 PM) :

but INTL on client side is not :-)

skidder (2:40 PM) :

the only problem with your proposal is that config file editing
complicates installation of INTL plugins

Adriano (2:40 PM) :

We can have a default: intl/*

skidder (2:42 PM) :

otherwise what?

skidder (2:42 PM) :

list of charsets and collations imported from each plugin?

Adriano (2:42 PM) :

Huh?

skidder (2:43 PM) :

what is going to happen in non-default case?

Adriano (2:44 PM) :

Only the plugins configured or only the built-in charsets/collations
will be available.

skidder (2:45 PM) :

what if you need some symbols from one plugin and some symbols from
another?

skidder (2:45 PM) :

BTW, there is another (theoretical) alternative - implement each
charset/collation in separate file :-)

Adriano (2:46 PM) :

Not nice.

Adriano (2:49 PM) :

Your @module is not nice too because "module" will be case-sensitive

skidder (2:49 PM) :

it is ok

skidder (2:49 PM) :

but not nice :-)

skidder (2:50 PM) :

I think Borland would store installed charsets in isc4.gdb, they use it
to store such config information

skidder (2:52 PM) :

BTW, Services API suffer from the same file names problem

skidder (2:55 PM) :

what about having manifest file describing contents of plugin?

Adriano (2:56 PM) :

For what it will be good?

skidder (2:56 PM) :

it looks like INTL has to be initialized server-wide

skidder (2:57 PM) :

it looks like for clean operation we need to load plugins, via including
multiple config files

Adriano (2:58 PM) :

To disable overlap items?

skidder (2:59 PM) :

yes, manifest file describes all contents of plugin. everything defined
in manifest is loaded by the engine

skidder (3:00 PM) :

if user finds conflict he may comment out some manifest entries to
resolve the conflict

Adriano (3:02 PM) :

Should this be easy to implement for the moment - many config (or
manifest) files?

skidder (3:02 PM) :

this is trivial in any case :-)

Adriano (3:02 PM) :

Don't look at FB config system.

skidder (3:03 PM) :

don't look at it :-)

skidder (3:03 PM) :

it is not going to help you

skidder (3:04 PM) :

But having manifest file for plugin is good from documentation point of
view - it describes exactly which charsets are contained in plugin

skidder (3:04 PM) :

and collations

Adriano (3:05 PM) :

If there is conflict the server don't start?

skidder (3:06 PM) :

yes, writes error message to log and raises exception

Adriano (3:07 PM) :

And no RDB$MODULE_NAME, right?

skidder (3:07 PM) :

right

skidder (3:08 PM) :

also, for user would be easy to look at manifest file to see where it
loads charset/collation from

Adriano (3:08 PM) :

Can you synthetize the discussion to present for the public?

skidder (3:09 PM) :

oke

========================================================================

Nickolay Samofatov



[Non-text portions of this message have been removed]