Subject | RE: [firebird-support] RTRIM Overhead |
---|---|
Author | Helen Borrie |
Post date | 2007-10-16T02:01:07Z |
At 10:30 AM 16/10/2007, you wrote:
of the UDF requires an allocation of 32Kb.
end of the field, not at the end of the data, i.e., you have no
trailing spaces in the CHARs, because they are all embedded between
the last non-blank character and the null terminator.
I'd appreciate if you'd let me know under what
circumstances the autocast would either handle null delimited strings or
remove trailing spaces?
A null terminator is a character and autocast won't treat it as a
blank. So, if your char(11) is e.g. xxxx*bbbbbb (where * is the NT
and the b's are blanks) then the varchar(11) would store as
xxxx*. If you are seeing xxxxbbbbbb* in the destination data then
the null terminator in the source field isn't where you
thought....absent any other theory I can think of.
input line was defined as varchar and was defined too short to
accommodate the longest possible input, the right-hand end of the
input line could spill over into the next line and mess up the
alignment from there on. For this reason I prefer to use CHAR for
mapping the input line, where over-length data will cause an
exception. But I've never had to work with null-terminated fields in
input data, so I don't know any secrets about dealing with that.
kind of "comfortable average" batching size per transaction for bulk
inserts is around 8000 records. If you have umpteen fields all
potentially 32Kb, it would be fewer (maybe a LOT fewer).
file to 140 tables? and 139 tables have active foreign keys to the
140th one? Or 139 tables have self-referencing FKs while one doesn't?
./hb
>Helen,You have the argument declared as 32Kb. That means each invocation
>
>I've imbedded answers to your questions below as well as added some
>questions.
>
>If there are only one or two input fields that need such a large
>container, make some more declarations, e.g., RTRIM30, etc., so that
>you reduce the number of huge padded strings being pushed around in memory.
> When the RTRIM function is invoked, doesn't the input
>parameter's size dictate how much data is being pushed around?
of the UDF requires an allocation of 32Kb.
>If RTRIM isThe UDF doesn't know you're only going to use 11 or 13 bytes...
>called passing it say a char(10) field, why would it push around more than
>11 bytes including the null delimiter?
>What was the problem with the autocast in the initial run, thatIt sounds to me more like your source CHAR fields have nulls at the
>caused you to resort to trimming?
> The problem was that trailing blanks were not being removed
>when the data was being inserted into the internal tables. I assume that
>the autocast would convert a CHAR from the external table into a VARCHAR in
>the internal table.
end of the field, not at the end of the data, i.e., you have no
trailing spaces in the CHARs, because they are all embedded between
the last non-blank character and the null terminator.
>I would much rather use the autocast feature thanCHAR(n) to VARCHAR(n) should autocast....
>RTRIM, if this would accomplish the objective of eliminating unneeded data
>from the end of character strings that is inserted into VARCHAR columns.
I'd appreciate if you'd let me know under what
circumstances the autocast would either handle null delimited strings or
remove trailing spaces?
A null terminator is a character and autocast won't treat it as a
blank. So, if your char(11) is e.g. xxxx*bbbbbb (where * is the NT
and the b's are blanks) then the varchar(11) would store as
xxxx*. If you are seeing xxxxbbbbbb* in the destination data then
the null terminator in the source field isn't where you
thought....absent any other theory I can think of.
> I've also considered defining the columns in the externalI haven't seen that; but I have seen where, if the last field of the
>table as VARCHAR. The problem with this approach is that alignment (also
>called slack) bytes are added so that the two byte size field that precedes
>the character data in a VARCHAR is aligned to an even byte boundary. This
>would be fine, except that I've experienced inconsistencies in FB's
>generation of the slack bytes in the past and have therefore stayed aware
>from placing binary data into external tables.
input line was defined as varchar and was defined too short to
accommodate the longest possible input, the right-hand end of the
input line could spill over into the next line and mess up the
alignment from there on. For this reason I prefer to use CHAR for
mapping the input line, where over-length data will cause an
exception. But I've never had to work with null-terminated fields in
input data, so I don't know any secrets about dealing with that.
>You didn't respond to Sean's question about the record size; but a
>You said it was a large input file (as it must be, to take so long) -
>how frequently are you committing during the run? are there indexes
>in the destination file?
> Currently, up to about 2gb of data is written to a
>sequential file. When a sequential file reaches this size (or the end of
>data is encountered) the sequential file is inserted into an internal table
>with a single SQL statement. Please let me know if the threshold for the
>size of the sequential files should be reduced from 2gb and if so, to what
>size?
kind of "comfortable average" batching size per transaction for bulk
inserts is around 8000 records. If you have umpteen fields all
potentially 32Kb, it would be fewer (maybe a LOT fewer).
> In one of the tables, there is one column defined as aIt gets more and more complicated, then. This upload is from one
>primary key. In all the other tables (about 140) the primary key is also
>defined as a foreign key. There are no other indexes.
file to 140 tables? and 139 tables have active foreign keys to the
140th one? Or 139 tables have self-referencing FKs while one doesn't?
>Thanks for your help.I think maybe "help" isn't the right word, so far. ;-)
./hb