Subject understanding character sets and collation
Author Andrew Zenz

Hi all.


Somewhat new to FB and still trying to understand a few things.  Currently attempting to get a handle on character sets and collation. Apologies if the questions appear to be from a simpleton.


I have read a few articles on character sets, including:

and the glossary in the manual.


1. If I understand correctly, NONE as the default character set for a database means all char, varchar and blob(1) columns accept any input as presented, meaning rubbish (corrupt?) data may be able to be stored in/ retrieved from the database?


2. When doing an order by on a column with NONE as the character set, for example ‘ORDER BY E.SURNAME’, surnames in all CAPS will be displayed before surnames in lower case (honouring the ASCII values).  Testing shows this can be overcome by using ‘UPPER(E.SURNAME)’ but this can be tedious and repetitive.  Would setting a database character set such as UTF-8 with a collation of UNICODE_CI_AI (as indicated in the second link), as a default, overcome this?  What would happen if rubbish/corrupt/invalid data was written (or attempted) to the database?


While I work/develop in Australia and much of the data entry is going to be English, some of our clients may end up wanting to include European characters.


The inference from the articles seems to be that having NONE as a character set is acceptable, but ideally it should/must have a character set.


I am trying to avoid being bitten on the but later on.


Any suggestions appreciated.