[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UNICODE (Re: а ну её на фиг, эту Xkb.)
On Tue, Jul 04, 2000 at 15:49:43 +0400, Alexander Voropay wrote:
> А где написано, что UNICODE Code Space 000000..10FFFF ?
[...]
> Прочитал внимательно. С одной стороны, они утверждают :
>
> "A single 16-bit number is assigned to each code element defined
> by the Unicode Standard, Version 3.0. Each of these 16-bit numbers
> is called a code value [...]"
Code value (code unit) - это не то же самое, что code point (unicode
scalar value). Опять же, не хочу здесь перевирать - см. UTR17.
Из unicode@unicode.org (автора не могу установить, т.к. у меня в
архивах только в цитате).
| In UTF-16, each 16-bit code value in the 0x0..0xD7FF range and the
| 0xE000..0xFFFF range directly corresponds to the same scalar value,
| while a "surrogate" pair of 16-bit code values algorithmically
| represents a single scalar value in the range 0x010000..0x10FFFF.
| The first half of the pair is always in the 0xD800..0xDBFF range,
| and the second half of the pair is in the 0xDC00..0xDFFF range.
| Unicode 3.0 and ISO/IEC 10646-1;2000 have adopted the UTF-16
| mechanism as the only official usage of the 0xD800..0xDFFF scalar
| range.
| Here are various ways of representing the proposed abstract
| character named "GOTHIC LETTER QAITHRA" (=Q) (which will probably be
| assigned to the Unicode scalar value 0x10335):
| * in Unicode notation, by its Unicode scalar value: U-00010335
| * as a UCS-4 code value sequence, in hex notation: 0x00010335
| * as a UCS-2 code value sequence: illegal; out of range
| * as a UTF-16 code value sequence, in hex notation: 0xD800 0xDF35
| * in Unicode notation, by its Unicode value pair: U+D800 U+DF35
| * in EBNF notation: \uD800 \uDF35
| * as a UTF-8 code value sequence, in hex notation: 0xF0 0x90 0x8c 0xB5
SY, Uwe
--
uwe@ptc.spbu.ru | Zu Grunde kommen
http://www.ptc.spbu.ru/~uwe/ | Ist zu Grunde gehen