MySQL Character Set and Collation
A character set is a set of rules for representing characters, and a collation defines the order of characters in a character set. Character sets are used to store data. A character set can contain multiple collations.
Our world is a diverse world, containing many languages ββand different characters. In order to store these different characters more efficiently, various character sets have emerged.
A character set defines rules for representing a set of characters (may be not all characters). Some character sets can only represent single-byte characters, and some others can represent multi-byte characters. Some character sets always represent characters in multiple bytes, and some others use single bytes to represent single-byte characters and multiple bytes to represent multi-byte characters.
The collation is the ordering logic of the characters in the character set. For example A
and a
, it may be A > a
in a collation, or may be A = a
in another collation, or A < a
may be in another collation.
The same character set supports multiple collations, but you must use one of them.
MySQL Character Set
MySQL supports various character sets, allowing you to store almost any character in a string.
You can use the SHOW CHARACTER SET
statement to get the character sets supported by the current MySQL server and the default collation for each character set.
SHOW CHARACTER SET;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| binary | Binary pseudo charset | binary | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |
+----------+---------------------------------+---------------------+--------+
Starting with MySQL 8, the default character set for MySQL is utf8mb4
, but in the earlier versions of MySQL, the default character set is latin1
. The latin1
character set can only represent single-byte characters, while utf8mb4
character sets can represent multi-byte characters including emoticons.
MySQL supports setting character sets at different levels, including: server-level, database-level, table-level, and column-level. You can set different character sets for different levels.
Server-level Character Set and Collation
The server-level character set and collation are the default character sets and collation rules for all databases. If you don’t specify a character set for a database, the database will used the server-level charset.
View MySQL Server Character Set
To view the server-level character set in MySQL, use the variables character_set_server
in the following statement:
SHOW VARIABLES LIKE "character_set_server";
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
To view the server-level collation in MySQL, use the variables collation_server
in following statement:
SHOW VARIABLES LIKE "collation_server";
+------------------+--------------------+
| Variable_name | Value |
+------------------+--------------------+
| collation_server | utf8mb4_0900_ai_ci |
+------------------+--------------------+
Set Server Character Set and Collation
There are 3 ways to set the character set of MySQL server.
-
Set the character set and collation of the MySQL server in the configuration file:
[mysqld] character-set-server = utf8mb4 collation-server = utf8mb4_0900_ai_ci
It is recommended and it is permanent. Remember to restart the MySQL server after modifying the configuration file.
-
Specify the character set and collation when starting the MySQL server:
mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_0900_ai_ci
-
Modify the character set and collation at the command line.
SET character_set_server = utf8mb4; SET collation_server = utf8mb4_0900_ai_ci;
This is only valid for the current session.
Database-level Character Set and Collation
The database-level character set and collation only take effect for the current database, which is also the default character set and collation of tables in the database.
View the character set and collation of the database
To view the character set and collation of the current database, use the following statement:
SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
+--------------------------+----------------------+
To view the character set and collation for a specified database, use the following statement:
SHOW CREATE DATABASE testdb;
+----------+----------------------------------------------------------------------------------------------------------------------------------+
| Database | Create Database |
+----------+----------------------------------------------------------------------------------------------------------------------------------+
| testdb | CREATE DATABASE `testdb` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ /*!80016 DEFAULT ENCRYPTION='N' */ |
+----------+----------------------------------------------------------------------------------------------------------------------------------+
You can get the database character set and collation in the CREATE DATABASE
statement.
Set database-level character set and collation
You can specify the character set and collation using CHARACTER SET
and COLLATE
clauses when creating a database, as follows:
CREATE DATABASE database_name
[CHARACTER SET charset_name]
[COLLATE collation_name];
You can also modify the character set and collation of the database using the ALTER DATABASE
statement:
ALTER DATABASE database_name
[CHARACTER SET charset_name]
[COLLATE collation_name];
Table-level character sets and collations
MySQL supports setting different character sets and collations for tables. If not set, the database-level character set and collation are inherited.
Set table-level character set and collation
To set character set and collation when creating a table, use the following CREATE TABLE
statement:
CREATE TABLE table_name (column_list)
[CHARACTER SET charset_name]
[COLLATE collation_name]
To modify the character set and collation of a table, use the following ALTER TABLE
statement:
ALTER TABLE table_name
[CHARACTER SET charset_name]
[COLLATE collation_name]
View table-level character set and collation
There are 3 ways to view the character set and collation of a table:
-
Use the
SHOW CREATE TABLE
statement view the character set and collation of a table:SHOW CREATE TABLE t1;
*************************** 1\. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `c1` int DEFAULT NULL, `c2` char(1) DEFAULT NULL, `c3` json DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
-
Use the
SHOW TABLE STATUS
statement view the character set and collation of a table:SHOW TABLE STATUS FROM testdb LIKE 't1'\G;
*************************** 1\. row *************************** Name: t1 Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 5 Avg_row_length: 3276 Data_length: 16384 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: 2022-04-20 02:31:42 Update_time: NULL Check_time: NULL Collation: utf8mb4_0900_ai_ci Checksum: NULL Create_options: Comment:
Here, the column
Collation
is the collationutf8mb4_0900_ai_ci
, which corresponds to the character setutf8mb4
. -
View a table’s character set and collation from the
information_schema.TABLES
table:SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = "testdb" AND TABLE_NAME = "t1"\G
+--------------------+ | TABLE_COLLATION | +--------------------+ | utf8mb4_0900_ai_ci | +--------------------+
Here, the column
TABLE_COLLATION
is the collation,utf8mb4_0900_ai_ci
corresponds to the character setutf8mb4
.
Column-level Character Set and Collation
The default character set and collation of a column is inherited from the table, but MySQL supports setting a different character set and collation for a column.
To set independent character set and collation for a column, define a column like this:
col_name data_type
[CHARACTER SET charset_name]
[COLLATE collation_name]
You can use the above column definitions in CREATE TABLE
and ALTER TABLE
statements to set individual character sets and collations for columns, for example:
ALTER TABLE t1
MODIFY c4 VARCHAR(20)
CHARACTER SET 'utf8'
COLLATE 'utf8_bin';
To view the character set and collation of a column, you can get them in the information_schema.COLUMNS
table:
SELECT CHARACTER_SET_NAME,
COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = "testdb"
AND TABLE_NAME = "t1"
AND COLUMN_NAME = "c4";
Conclusion
MySQL provides various character sets to efficiently store various characters. You can set different character levels and collations for the server, databases, tables, and columns.
If you need to convert strings between different character sets, use two functions provided by MySQL: CONVERT
and CAST
.