Since the last PHPMyAdmin update I see that now the default character set is utf8mb4
.
I would like to know what is the difference between utf8mb4
and utf8
and if there is any specific reason known why this variant, if we can call it that, of utf8 exists.
Also, if I decide to change the character set of my tables and columns to utf8mb4
I would like to know if I might have a problem.
good day, as the documentation mentions, since MySQL version 5.5.3 this "variant" of utf is added. Now what is the difference?
UTF-8 The UTF-8 encoding can represent each symbol in the Unicode character set, which ranges from U+000,000 to U+10FFFF. That's 1,114,112 possible symbols. (Not all of these Unicode code points have been assigned characters yet, but that doesn't prevent UTF-8 from being able to encode them.)
Many times we've used MySQL's utf8charset for databases, tables, and columns, assuming it maps to the UTF-8 encoding described above. By using utf8, assuming almost any symbol can be stored.
Example:
Now see warings:
It turns out that MySQL utf8charset only partially implements proper UTF-8 encoding. Symbols consisting of one to three bytes UTF-8-encoded; encoded symbols that occupy four bytes are not supported.
This not only affects the ? character, but more important symbols like U+01F4A9 ( ?) as well. In total, of the 1,048,575 possible code points, it cannot be used. In fact, MySQL's utf8 is only allowed to store 5.88% ( (0x00FFFF + 1) / (0x10FFFF + 1)) of all possible Unicode code points. Proper UTF-8 can encode 100% of all Unicode code points.
Now if you want to change the encoding in your tables or databases because utf8mb4 is fully compatible with utf8, just before moving something to it, create a backup of your information.
As the documentation comments :