firebird-java - Re: [Firebird-Java] Connection character set information

Subject	Re: [Firebird-Java] Connection character set information
Author	William L. Thomson Jr.
Post date	2017-03-11T18:52:57Z

On Saturday, March 11, 2017 11:47:19 AM EST you wrote:

>
> Maybe you should switch to its alias encoding instead, less chance of
> typos ;)

That is for another day, but that was just in typing the email not in usage.

> I'm not sure what you mean by bloating in the case of FB3 (it should
> behave similar to FB2.5, otherwise you might have hit a bug).

Maybe, just noticed some database are not much larger than they have been and
I am pretty sure it is not due to increased usage. I will have to look at some
others.

> One of the
> main drawbacks of UTF8 in Firebird is that it reduces the max number of
> characters you can store in a CHAR or VARCHAR by a factor 4 (eg only
> 8191 instead of 32764), and it could cause issues with the maximum
> length of the parameter or value blocks in the protocol, if the total
> (maximum) length of parameters (or result columns) exceeds 64K bytes
> (where blobs count as 8 bytes).

Maybe that was the reason I avoided it then. Not that I am storing large
amounts, but I think it threw off all sizes.

> Yes, ISO-8859-1 defines only 191 characters, while - according to
> Wikipedia - UTF-8 encodes 1,112,064 codepoints ('characters'). In short
> all of ISO-8859-1 maps to UTF-8.

Yes, I just had fun with some of that, parity bit flipping, etc.

> Purely looking byte-wise, only bytes 0-127 map from ISO-8859-1 directly
> into UTF-8 (ie the ASCII range), if you try to decode ISO-8859-1 bytes
> 128-255 as UTF-8, you'll end up with question marks in Java (or a
> transliteration error in Firebird). If you encode ISO-8859-1 characters
> 128-255* to UTF-8, they will have a 2-byte encoding per character.

I was just there, felt like living in the past...

> No, it is better to pick one. Specifying both lc_ctype and charSet is
> risky if you use mismatched values (say lc_ctype=WIN1250 and
> charSet=Cp1252 instead of Cp1250). The possibility to specify both is an
> oddity that allows you to 'translate' between character sets if they
> happen to have been stored incorrectly. It is a feature that I almost
> broke early in Jaybird 3 because I initially thought it was a bug that
> you could have mismatched values for character sets other than NONE.

I will stick with one, charSet. Unless for some reason I need to translate.

--
William L. Thomson Jr.