(mysql.info.gz) Charset-Unicode

(mysql.info.gz) Charset-operations

 
 10.5 Unicode Support
 ====================
 
 As of MySQL version 4.1, there are two new character sets for storing
 Unicode data:
 
    * `ucs2', the UCS-2 Unicode character set.
 
    * `utf8', the UTF8 encoding of the Unicode character set.
 
 In UCS-2 (binary Unicode representation), every character is
 represented by a two-byte Unicode code with the most significant byte
 first. For example: "LATIN CAPITAL LETTER A" has the code 0x0041 and
 it's stored as a two-byte sequence: 0x00 0x41. "CYRILLIC SMALL LETTER
 YERU" (Unicode 0x044B) is stored as a two-byte sequence: 0x04 0x4B. For
 Unicode characters and their codes, please refer to the Unicode Home
 Page (http://www.unicode.org/).
 
 A temporary restriction is that UCS-2 cannot yet be used as a client
 character set. That means that `SET NAMES 'ucs2'' will not work.
 
 The UTF8 character set (transform Unicode representation) is an
 alternative way to store Unicode data. It is implemented according to
 RFC2279. The idea of the UTF8 character set is that various Unicode
 characters fit into byte sequences of different lengths:
 
    * Basic Latin letters, digits, and punctuation signs use one byte.
 
    * Most European and Middle East script letters fit into a two-byte
      sequence: extended Latin letters (with tilde, macron, acute, grave
      and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic,
      Syriac, and others.
 
    * Korean, Chinese, and Japanese ideographs use three-byte sequences.
 
 
 Currently, MySQL UTF8 support does not include four-byte sequences.
 
 Tip: To save space with UTF8, use `VARCHAR' instead of `CHAR'.
 Otherwise, MySQL has to reserve 30 bytes for a `CHAR(10) CHARACTER SET
 utf8' column, because that's the maximum possible length.

Info Catalog

(mysql.info.gz) Charset-operations

(mysql.info.gz) Charset

(mysql.info.gz) Charset-metadata

automatically generated byinfo2html