[Lazarus] UTF16 2 utf8

Hans-Peter Diettrich DrDiettrich1 at aol.com
Thu May 5 16:40:38 CEST 2011


José Mejuto schrieb:

> I think that the text that says the UCS2 has been extended, does not
> means that UCS2 has been extended, it says that UCS2 has been extended
> to UTF-16, so UCS2 can not be considered Unicode anymore as noted in
> ISO 10646:
> 
> UCS-2. UCS-2 stands for �Universal Character Set coded in 2 octets� and is also known as
> �the two-octet BMP form.� It was documented in earlier editions of 10646 as the two-octet
> (16-bit) encoding consisting only of code positions for plane zero, the Basic Multilingual
> Plane. This documentation has been removed from ISO/IEC 10646:2011, and the term
> UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either
> 10646 or the Unicode Standard.

I agree that UCS-2 no longer represents the current Unicode range, but 
it still is a true UCS-4 subset (BMP).

The UCS standards define Unicode as ranges of values, while the UTF 
standards define encodings.

The UTF-7/8 encodings are purely numerical compression schemes, while 
UTF-16 (with surrogate pairs) more reflects a tree-like structure of 
"planes", "groups", "blocks", "codepages" etc., favored by the Unicode 
Consortium. Such a view may be interesting to font writers, which can 
restrict an font to part of the full Unicode range, but is of little 
help with handling Unicode programmatically.

DoDi





More information about the Lazarus mailing list