Subject Re: [firebird-support] Batch Insert from external table
Author Helen Borrie
At 04:22 PM 21/05/2005 +0000, you wrote:
>I'm working on a project that imports data from external data sources
>into a firebird (V1.5.2) database.
>
>I found out that the fastest way to do that was to use external
>tables.
>
>I put the data to ascii files, create an external table that reads
>from that file and I execute the following statement:
>
>INSERT INTO MYTABLE (COLUMN1, COLUMN2, ...)
>SELECT NULLIF(COLUMN1,''), NULLIF(COLUMN2,''), ...
>FROM MYEXTERNALTABLE
>
>That works fine and realy fast!
>
>The problem is with the columns of type char or varchar. I noticed
>that
>in MYTABLE all the similar typed columns where right-padded with
>spaces
>to the full length of each column as declared in the external table.

As is expected with fixed length input fields...

CHARs are always padded to the full length, anyway. VARCHARs with trailing
blank-padding will be stored as the "nett" string and the actual length of
the incoming data. When the varchar is retrieved, the value will be
returned including enough blanks at the end to make the value up to the
stored length.


>To solve that I used the rtrim UDF from ib_udf.dll and my statement
>became like this:
>
>INSERT INTO MYTABLE (COLUMN1, COLUMN2, ...)
>SELECT NULLIF(RTRIM(COLUMN1),''), NULLIF(RTRIM(COLUMN2),''), ...
>FROM MYEXTERNALTABLE
>
>The results where the same!! The only difference was that now the
>columns where padded to the size of the column in MYTABLE.

That should be the case with CHARs, but not with VARCHARS... How are you
testing the results of your conversions?

Could you try a reality check with this:

SELECT COALESCE(COLUMN1,''<null>') as F1,
COALESCE(COLUMN2,'<null>') as F2
FROM MYTABLE

to test whether you are either 1) getting variable results or 2) using a
retrieval tool that substitutes "zero-values" for nulls in its output.


>I tried the following:
>
>INSERT INTO MYTABLE (COLUMN1, COLUMN2, ...)
>SELECT RTRIM(COLUMN1), RTRIM(COLUMN2), ...
>FROM MYEXTERNALTABLE
>
>Now the length of the resulted columns were the correct but, as I
>expected I had no null values but '' (empty strings)

Go back to your right-trimming solution, but use a CASE expression to
simulate what NULLIF is evaluating under the hood:

INSERT INTO MYTABLE (COLUMN1, COLUMN2, ...)
SELECT CASE
WHEN (RTRIM(COLUMN1) = '') THEN '<empty>'
ELSE RTRIM(COLUMN1) END,
WHEN (RTRIM(COLUMN2) = '') THEN '<empty>'
ELSE RTRIM(COLUMN2) END,
FROM MYEXTERNALTABLE

If you see '<empty>' values in your results in the expected places, you
will know that the engine is actually receiving an empty string where you
expect it; if you see blanks where you expected empty strings, then
something else in your input data is interfering with the rtrim.

>An other problem is that the rtrim UDF, is limited to CSTRING(255),
>something that may not cover all my cases.
>
>Does anyone have an idea how to overcome tha problem?

You can declare the same UDF in your database as many times as you
like: only the NAME of the UDF you declare must be unique. So, for
example, you can have all of these declarations (or any other lengths up to
the maximum varchar size):

DECLARE EXTERNAL FUNCTION rtrim_short
CSTRING(80)
RETURNS CSTRING(80) FREE_IT
ENTRY_POINT 'IB_UDF_rtrim' MODULE_NAME 'ib_udf';

DECLARE EXTERNAL FUNCTION rtrim_255
CSTRING(255)
RETURNS CSTRING(255) FREE_IT
ENTRY_POINT 'IB_UDF_rtrim' MODULE_NAME 'ib_udf';

DECLARE EXTERNAL FUNCTION rtrim_1024
CSTRING(1024)
RETURNS CSTRING(1024) FREE_IT
ENTRY_POINT 'IB_UDF_rtrim' MODULE_NAME 'ib_udf';

[horrible]
DECLARE EXTERNAL FUNCTION rtrim_max
CSTRING(32265)
RETURNS CSTRING(32265) FREE_IT
ENTRY_POINT 'IB_UDF_rtrim' MODULE_NAME 'ib_udf';

Also: make sure you are using a proper version of ib_udf, and not some
relic from IB6 or a Firebird 1 beta. RTrim and LTrim had long-standing
bugs that were fixed (AFAIR) at some point late in the Firebird 1.0 beta cycle.

./heLen