Subject | Re: Changing Charset only Pumping ? |
---|---|
Author | patrick_marten |
Post date | 2012-11-16T15:15:50Z |
> >> no, only pump. unicode uses up to 4 bytes per character.Ah, ok, didn't consider that part about the characterset NONE.
> >> Charset NONE uses only 1 byte per character.
> >> So, field char/varchar(20) can store 20 characters for NONE,
> >> and from 20 to 5 characters for UTF8, depending on how much bytes
> >> each character have.
> >> Thus, you may need to increase your character fields size.
> >
> > Is that really the case?
> > Shouldn't the length remain the same and "just" the size of the database become larger?
>
> If you are talking about a database with characterset NONE, then a
> (VAR)CHAR(100) will accept 100 bytes worth of characters. For UTF-8 it
> would then mean it will accepts up to 25 characters (I think that isn't
> entirely true, as UTF-8 is 1 to 4 bytes per characters, so you might be
> able to store more if you also use connection characterset NONE and send
> the data as UTF-8).
>
> Now if the default characterset is UTF-8 (or if the column itself is
> UTF-8), then (VAR)CHAR(100) will accept 100 characters, but it will
> store up to 400 bytes of data.
But if I have a database with ISO8859_1 as character set, there isn't going to be a problem with the lengths of (var)char fields when pumping the data into an UTF8 database, into (var)char fields defined with the same length / number of characters, right?