Subject | Re: [Firebird-Java] Connection character set information |
---|---|
Author | William L. Thomson Jr. |
Post date | 2017-03-11T18:52:57Z |
On Saturday, March 11, 2017 11:47:19 AM EST you wrote:
I am pretty sure it is not due to increased usage. I will have to look at some
others.
amounts, but I think it threw off all sizes.
--
William L. Thomson Jr.
>That is for another day, but that was just in typing the email not in usage.
> Maybe you should switch to its alias encoding instead, less chance of
> typos ;)
> I'm not sure what you mean by bloating in the case of FB3 (it shouldMaybe, just noticed some database are not much larger than they have been and
> behave similar to FB2.5, otherwise you might have hit a bug).
I am pretty sure it is not due to increased usage. I will have to look at some
others.
> One of theMaybe that was the reason I avoided it then. Not that I am storing large
> main drawbacks of UTF8 in Firebird is that it reduces the max number of
> characters you can store in a CHAR or VARCHAR by a factor 4 (eg only
> 8191 instead of 32764), and it could cause issues with the maximum
> length of the parameter or value blocks in the protocol, if the total
> (maximum) length of parameters (or result columns) exceeds 64K bytes
> (where blobs count as 8 bytes).
amounts, but I think it threw off all sizes.
> Yes, ISO-8859-1 defines only 191 characters, while - according toYes, I just had fun with some of that, parity bit flipping, etc.
> Wikipedia - UTF-8 encodes 1,112,064 codepoints ('characters'). In short
> all of ISO-8859-1 maps to UTF-8.
> Purely looking byte-wise, only bytes 0-127 map from ISO-8859-1 directlyI was just there, felt like living in the past...
> into UTF-8 (ie the ASCII range), if you try to decode ISO-8859-1 bytes
> 128-255 as UTF-8, you'll end up with question marks in Java (or a
> transliteration error in Firebird). If you encode ISO-8859-1 characters
> 128-255* to UTF-8, they will have a 2-byte encoding per character.
> No, it is better to pick one. Specifying both lc_ctype and charSet isI will stick with one, charSet. Unless for some reason I need to translate.
> risky if you use mismatched values (say lc_ctype=WIN1250 and
> charSet=Cp1252 instead of Cp1250). The possibility to specify both is an
> oddity that allows you to 'translate' between character sets if they
> happen to have been stored incorrectly. It is a feature that I almost
> broke early in Jaybird 3 because I initially thought it was a bug that
> you could have mismatched values for character sets other than NONE.
--
William L. Thomson Jr.