


The AL32UTF8 character set supports the latest version of the Unicode standard. It encodes characters in one, two, or three bytes. Supplementary characters require four bytes. It is for ASCII-based platforms.



The UTF8 character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms.

Supplementary characters inserted into a UTF8 database do not corrupt the data in the database. A supplementary character is treated as two separate, user-defined characters that occupy 6 bytes. Oracle recommends that you switch to AL32UTF8 for full support of supplementary characters in the database character set.


The UTFE character set is for EBCDIC platforms. It is similar to UTF8 on ASCII platforms, but it encodes characters in one, two, three, and four bytes. Supplementary characters are converted as two 4-byte characters.



One character can be either 2 bytes or 4 bytes in UTF-16. Characters from European and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes. UTF-16 is the main Unicode encoding used for internal processing by Java since version J2SE 5.0 and by Microsoft Windows since version 2000.

(查看Unicode编码的一个方法:按Alt + X 组合键,MS Word会将光标前面的字符同其十六进制的四位 Unicode 编码进行互相转换)


? Unicode Database。创建DB的时候指定数据库字符集为AL32UTF8 ( CREATE DATABASE...CHARACTER SET AL32UTF8),SQL字符类型(CHAR, VARCHAR2, CLOB, and LONG)的字段中存储的就是UTF-8编码的Unicode数据。

? Unicode Data Type。指定列或变量的类型为national character data types。包括NCHAR, NVARCHAR2, and NCLOB。national character data types可以使用UTF8 or AL16UTF16作为字符集,默认为AL16UTF16。创建数据库的时候可以指定(CREATE DATABASE...NATIONAL CHARACTER SET AL16UTF16/UTF8)。


如果使用AL16UTF16字符集,NCHAR和NVARCHAR2的最大长度分别是1000和2000个字符。The maximum length limits for the NCHAR and NVARCHAR2 columns are 1000 and 2000 characters, respectively. Because the data is fixed-width, the lengths are guaranteed.( 实际上,如果包含Unicode扩展字符,最大长度也有可能到不了1000和2000?)



You can store Unicode characters in an Oracle Database in two ways:

? You can create a Unicode database that enables you to store UTF-8 encoded characters as SQL character data types (CHAR, VARCHAR2, CLOB, and LONG).

? You can declare columns and variables that have SQL national character data types.

The SQL national character data types are NCHAR, NVARCHAR2, and NCLOB. They are also called Unicode data types, because they are used only for storing Unicode data.

The national character set, which is used for all SQL national character data types, is specified when the database is created. The national character set can be either UTF8 or AL16UTF16 (default).

When you declare a column or variable of the type NCHAR or NVARCHAR2, the length that you specify is the number of characters, not the number of bytes.查看数据库字符集和National Character字符集的方法。

select * from v$nls_parameters;


NLS_NCHAR_CHARACTERSET - National Character字符集

选择Unicode Database还是Unicode Data Type,Oracle给出的建议:

以下情况建议使用Unicode Database



You need easy code migration for Java or PL/SQL.


If your existing application is mainly written in Java and PL/SQL and your main concern is to minimize the code changes required to support multiple languages, then you may want to use a Unicode database solution. If the data types used to stored data remain as SQL CHAR data types, then the Java and PL/SQL code that accesses these columns does not need to change.

You have evenly distributed multilingual data.


If the multilingual data is evenly distributed in existing schema tables and you are not sure which tables contain multilingual data, then you should use a Unicode database because it does not require you to identify the kind of data that is stored in each column.

Your SQL statements and PL/SQL code contain Unicode data.


You must use a Unicode database. SQL statements and PL/SQL code are converted into the database character set before being processed. If the SQL statements and PL/SQL code contain characters that cannot be converted to the database character set, then those characters are lost. A common place to use Unicode data in a SQL statement is in a string literal.

You want to store multilingual documents in BLOB format and use Oracle Text for content searching.


You must use a Unicode database. The BLOB data is converted to the database character set before being indexed by Oracle Text. If your database character set is not UTF8, then data is lost when the documents contain characters that cannot be converted to the database character set.


