Subject | Re: [firebird-support] loading data from a .csv file? |
---|---|
Author | Kjell Rilbe |
Post date | 2014-05-24T06:09:15Z |
Den 2014-05-18 21:23 skrev Alan Shank linux@...
[firebird-support] såhär:
pump" methods, both free and commercial import utilities as well as my
own C# import loop (Windows and .Net in that case obviously). They all
use a loop of insert statements, one for each .csv file row. While it
works fine, it's generally very slow.
I've now switched completely to external tables, or in some cases (not
so many records) generating insert statements and executing them in one
query batch. It does require that you modify the file, but it's
extremely fast!
I usually modify the format to fixed width with or without separator,
matching a FB table with char columns (with an extra dummy column for
each separator, if present, and an extra dummy column for the
linebreak). It's easiest with a character set with a fixed number of
bytes per character like iso 8859-1, ascii or win-1252, because FB
expects each column to be the same number of bytes, not characters. I.e.
with utf8 format it expects a char(10) column to consist of 40 byte in
the file. The "problem" with that is that with text editors etc. such a
string will appear as anything from 10 to 40 characters, depending on
what characters you enter. If you enter ten 4 byte characters, that's
the same byte length as 40 1 byte characters. So, such a file will not
be fixed-width if you count characters. If you convert the format with
some utility, you have to pad the strings to 40 byte, taking care to
really count byte, not characters. With fixed width character sets, it's
much easier.
Then use an external table like this (if separators are present and line
breaks in your system are 1 single character)
create table "External" external file 'yourfilepath' (
"DataCol1" char(10) character set iso88591,
"Sep1" char(1) character set iso88591,
"DataCol2" char(25) character set iso88591,
"Sep2" char(1) character set iso88591,
"DataCol3" char(7) character set iso88591,
"Linebreak" char(1) character set iso88591
);
Then import into your real table making sure the char columns are
appropriately converted:
insert into "MyTable" (
"SomeString", -- varchar(10)
"SomeDate", -- datetime
"SmallInteger" -- int
)
select trim(cast("DataCol1" as varchar(10)) "SomeString",
cast("DataCol2" as datetime) "SomeDate",
cast("DataCol3" as int) "SmallInteger";
Good luck!
Kjell
[firebird-support] såhär:
>Thomas mentioned external tables. I've tried a couple of other "data
> Is there some way to load data into a table in Firebird, like "LOAD DATA
> INFILE..." in Mysql? I have comma-delimited files from Mysql that I want
> to load into Firebird tables.
>
> I have Firebird2.5-super on Ubuntu 14.04, 64-bit.
>
pump" methods, both free and commercial import utilities as well as my
own C# import loop (Windows and .Net in that case obviously). They all
use a loop of insert statements, one for each .csv file row. While it
works fine, it's generally very slow.
I've now switched completely to external tables, or in some cases (not
so many records) generating insert statements and executing them in one
query batch. It does require that you modify the file, but it's
extremely fast!
I usually modify the format to fixed width with or without separator,
matching a FB table with char columns (with an extra dummy column for
each separator, if present, and an extra dummy column for the
linebreak). It's easiest with a character set with a fixed number of
bytes per character like iso 8859-1, ascii or win-1252, because FB
expects each column to be the same number of bytes, not characters. I.e.
with utf8 format it expects a char(10) column to consist of 40 byte in
the file. The "problem" with that is that with text editors etc. such a
string will appear as anything from 10 to 40 characters, depending on
what characters you enter. If you enter ten 4 byte characters, that's
the same byte length as 40 1 byte characters. So, such a file will not
be fixed-width if you count characters. If you convert the format with
some utility, you have to pad the strings to 40 byte, taking care to
really count byte, not characters. With fixed width character sets, it's
much easier.
Then use an external table like this (if separators are present and line
breaks in your system are 1 single character)
create table "External" external file 'yourfilepath' (
"DataCol1" char(10) character set iso88591,
"Sep1" char(1) character set iso88591,
"DataCol2" char(25) character set iso88591,
"Sep2" char(1) character set iso88591,
"DataCol3" char(7) character set iso88591,
"Linebreak" char(1) character set iso88591
);
Then import into your real table making sure the char columns are
appropriately converted:
insert into "MyTable" (
"SomeString", -- varchar(10)
"SomeDate", -- datetime
"SmallInteger" -- int
)
select trim(cast("DataCol1" as varchar(10)) "SomeString",
cast("DataCol2" as datetime) "SomeDate",
cast("DataCol3" as int) "SmallInteger";
Good luck!
Kjell