29 March 2006

MySQL: CHARSET from latin1 to utf8

A website im supporting needs to have multilingual characters. The default character set for MySQL is latin1. This unfortunately will not support Chinese nor other wierd multibyte characters.

It will quietly support them, but returns gibberish and will cause frustration all round.

After digging around, the best character set to use is UTF8.
To set the default charset for the server, the my.cfg/my.ini file has to be modified:

default-character-set=utf8


Unfortunately, once a database and their tables have been defined as latin1, they remain as latin1 unless you run this for each database:

alter database mydatabase charset=utf8;


and for each table:
alter table mytable charset=utf8;


and for each varchar/char type column:
alter table mytable alter column mycol charset=utf8;


and repeat ad infinitum....

This is rather tedious and boring, so there should be a better way. And that is to dump out the sql, change the charset and dump it back in. Here is the script.


===== latin1ToUTF8.sh

echo Script to convert MySQL latin1 charsets to utf8.
echo Usage: $0 dbname
echo 060329 yky Created.

echo Dumping out $1 database
mysqldump --add-drop-table $1 > db.sql

mydate=`date +%y%m%d`
echo Making a backup
mkdir bak &> /dev/null
cp db.sql bak/$1.$mydate.sql

echo String replacing latin1 with utf8
cat db.sql | replace CHARSET=latin1 CHARSET=utf8 > db2.sql

echo Pumping back $1 into database
mysql $1 < db2.sql

echo Changing db charset to utf8
mysql $1 -e "alter database $1 charset=utf8;"

echo $1 Done!


======

There must be a better way ?!

yk.