Skip Headers
Oracle® Database Globalization Support Guide
10g Release 2 (10.2)
B14225-01
  Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
Next
Next
 

B Unicode Character Code Assignments

This appendix offers an introduction to Unicode character assignments. This appendix contains:

Unicode Code Ranges

Table B-1 contains code ranges that have been allocated in Unicode for UTF-16 character codes.

Table B-1 Unicode Character Code Ranges for UTF-16 Character Codes

Types of Characters First 16 Bits Second 16 Bits
ASCII 0000-007F -
European (except ASCII), Arabic, Hebrew 0080-07FF -
Iindic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean 0800-0FFF

1000 - CFFF

D000 - D7FF

F900 - FFFF

-
Private Use Area #1 E000 - EFFF

F000 - F8FF

-
Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols D800 - D8BF

D8CO - DABF

DAC0 - DB7F

DC00 - DFFF

DC00 - DFFF

DC00 - DFFF

Private Use Area #2 DB80 - DBBF

DBC0 - DBFF

DC00 - DFFF

DC00 - DFFF


Table B-2 contains code ranges that have been allocated in Unicode for UTF-8 character codes.

Table B-2 Unicode Character Code Ranges for UTF-8 Character Codes

Types of Characters First Byte Second Byte Third Byte Fourth Byte
ASCII 00 - 7F - - -
European (except ASCII), Arabic, Hebrew C2 - DF 80 - BF - -
Indic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean E0

E1 - EC

ED

EF

A0 - BF

80 - BF

80 - 9F

A4 - BF

80 - BF

80 - BF

80 - BF

80 - BF

-
Private Use Area #1 EE

EF

80 - BF

80 - A3

80 - BF

80 - BF

-
Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols F0

F1 - F2

F3

90 - BF

80 - BF

80 - AF

80 - BF

80 - BF

80 - BF

80 - BF

80 - BF

80 - BF

Private Use Area #2 F3

F4

B0 - BF

80 - 8F

80 - BF

80 - BF

80 - BF

80 - BF



Note:

Blank spaces represent nonapplicable code assignments. Character codes are shown in hexadecimal representation.

UTF-16 Encoding

As shown in Table B-1, UTF-16 character codes for some characters (Additional Chinese/Japanese/Korean characters and Private Use Area #2) are represented in two units of 16-bits. These are supplementary characters. A supplementary character consists of two 16-bit values. The first 16-bit value is encoded in the range from 0xD800 to 0xDBFF. The second 16-bit value is encoded in the range from 0xDC00 to 0xDFFF. With supplementary characters, UTF-16 character codes can represent more than one million characters. Without supplementary characters, only 65,536 characters can be represented. Oracle's AL16UTF16 character set supports supplementary characters.

UTF-8 Encoding

The UTF-8 character codes in Table B-2 show that the following conditions are true:

Oracle's AL32UTF8 character set supports 1-byte, 2-byte, 3-byte, and 4-byte values. Oracle's UTF8 character set supports 1-byte, 2-byte, and 3-byte values, but not 4-byte values.