Subject | Re: [firebird-support] Re: What are the trade-offs of CHAR vs. VARCHAR? |
---|---|
Author | Mark Rotteveel |
Post date | 2019-11-14T19:21:12Z |
On 14-11-2019 18:01, blackfalconsoftware@... [firebird-support]
wrote:
humans, and humans are fallible. This is probably an incorrect
(re)interpretation of what the InterBase 6 Data Definition guide (page
74) says:
"""
*Trailing blanks* InterBase compresses trailing blanks when it stores
fixed-length strings, so data with trailing blanks uses the same amount
of space as an equivalent variable-length string. When the data is read,
InterBase reinserts the blanks. This saves disk space when the length of
the data items varies widely.
"""
Which seems to be an oversimplification of the RLE compression that is
applied on the record when storing (and the fact VARCHAR is actually not
much different from CHAR inside the Firebird engine).
However for the observable effects for the user there is no difference.
But the entire record will be compressed before being written to disk.",
meaning that on disk, a record can be shorter because of the RLE
compression that is applied. However even in that shorter form, the
value is still padded (but that padding will be compressed due to the RLE).
on-disk encoding form.
inferring too much on a minor error in user documentation.
correct. See
https://docs.microsoft.com/en-us/sql/t-sql/statements/set-ansi-padding-transact-sql?view=sql-server-ver15
Specifically look at the table under Remarks and compare the effects for
CHAR(n) NOT NULL vs CHAR(n) NULL. With ANSI_PADDING OFF, CHAR(n) NULL
behaves as a VARCHAR(n) in SQL Server, while CHAR(n) NOT NULL behaves as
a SQL standard CHAR(n). With ANSI_PADDING ON, CHAR(n) NULL behaves as
CHAR(n) NOT NULL and as SQL standard CHAR(n). Similarly VARCHAR(n) in
SQL Server will trim all spaces in OFF, but preserve significant
whitespace (that is explicitly added spaces) when ON.
But here also, the behaviour is described from the user-visible effects,
and does not necessarily imply anything about the underlying storage
implementation.
standard, the underlying storage implementation does not matter much
when you **use** a database, it is about the effective behaviour.
However, if you want to discuss internals, then please make sure you are
at least in the right ballpark, or ask for confirmation instead of
stating something with conviction.
Mark
--
Mark Rotteveel
wrote:
> I found what we have been arguing over in the Firebird 2.5 LanguageThat is because there is no English version for 3.0 yet, unfortunately.
> Reference Manual since I don't seem to have the one for version 3.0. I
> looked for it at the Firebird site but it does not appear to be listed...
> However, the PDF language manual I have for version 2.5 does in factWell, that quote is unfortunately wrong. Documentation is written by
> state the following...
>
> "A fixed-length character data type. When its data is displayed,
> trailing spaces are added to the string up to the specified length.
> Trailing spaces are not stored in the database but are restored to match
> the defined
> length when the column is displayed on the client side. Network traffic
> is reduced by not sending spaces over the LAN. If the number of
> characters is not specified, 1 is used by default."
humans, and humans are fallible. This is probably an incorrect
(re)interpretation of what the InterBase 6 Data Definition guide (page
74) says:
"""
*Trailing blanks* InterBase compresses trailing blanks when it stores
fixed-length strings, so data with trailing blanks uses the same amount
of space as an equivalent variable-length string. When the data is read,
InterBase reinserts the blanks. This saves disk space when the length of
the data items varies widely.
"""
Which seems to be an oversimplification of the RLE compression that is
applied on the record when storing (and the fact VARCHAR is actually not
much different from CHAR inside the Firebird engine).
However for the observable effects for the user there is no difference.
> If the CHAR data type is not being stored with the additional paddedAnn did not say it is only padded in memory. She said "In memory, yes.
> characters but only done so in memory as Ann Harrison has stated than I
> have so far only found this information to be at odds with other
> information I have found regarding this data type.
But the entire record will be compressed before being written to disk.",
meaning that on disk, a record can be shorter because of the RLE
compression that is applied. However even in that shorter form, the
value is still padded (but that padding will be compressed due to the RLE).
> Now what is being said is that internally, a CHAR data type no longerNo, that is not how it works. See also my other email that described the
> pads its field to the defined length at the point of creation. In this
> case then, the field would then always be expanded to the length
> required of updated data to the maximum defined at field definition
> time. VARCHAR data on the other hand will act in the same manner except
> to use its length-info bytes to base its updated storage field size on.
on-disk encoding form.
> So when did this change? Is this a new difference between Firebird 2.5It did not change, it has been the same since before Firebird 1. You are
> and 3.x.x?
inferring too much on a minor error in user documentation.
> Here is a link to a detailed description as to how the major databasesYou forgot to post a link.
> store CHAR and VARCHAR data internally. With the exception of SQLYour description of how SQL Server ANSI_PADDING works is not entirely
> Server, which specifies that ANSI_PADDING has to be set to "ON" to store
> padded spaces with CHAR data types (which, to my knowledge, is how I
> mostly worked with this database engine), all of the listed database
> engines (with the exception of SQLite) say about the same thing for such
> storage as I have been stating here. For SQLite, this would be expected
> of since it is a not a strongly typed database engine.
correct. See
https://docs.microsoft.com/en-us/sql/t-sql/statements/set-ansi-padding-transact-sql?view=sql-server-ver15
Specifically look at the table under Remarks and compare the effects for
CHAR(n) NOT NULL vs CHAR(n) NULL. With ANSI_PADDING OFF, CHAR(n) NULL
behaves as a VARCHAR(n) in SQL Server, while CHAR(n) NOT NULL behaves as
a SQL standard CHAR(n). With ANSI_PADDING ON, CHAR(n) NULL behaves as
CHAR(n) NOT NULL and as SQL standard CHAR(n). Similarly VARCHAR(n) in
SQL Server will trim all spaces in OFF, but preserve significant
whitespace (that is explicitly added spaces) when ON.
But here also, the behaviour is described from the user-visible effects,
and does not necessarily imply anything about the underlying storage
implementation.
> If in fact there was a significant change to how the Firebird engineThe behaviour of CHAR and VARCHAR in Firebird are compliant with the SQL
> stores such CHAR than it would be mirroring with SQLite and not the
> other major database engines available.
standard, the underlying storage implementation does not matter much
when you **use** a database, it is about the effective behaviour.
However, if you want to discuss internals, then please make sure you are
at least in the right ballpark, or ask for confirmation instead of
stating something with conviction.
Mark
--
Mark Rotteveel