[Lazarus] UTF16 2 utf8

Thu May 5 16:40:38 CEST 2011

José Mejuto schrieb:

> I think that the text that says the UCS2 has been extended, does not
> means that UCS2 has been extended, it says that UCS2 has been extended
> to UTF-16, so UCS2 can not be considered Unicode anymore as noted in
> ISO 10646:
> 
> UCS-2. UCS-2 stands for �Universal Character Set coded in 2 octets� and is also known as
> �the two-octet BMP form.� It was documented in earlier editions of 10646 as the two-octet
> (16-bit) encoding consisting only of code positions for plane zero, the Basic Multilingual
> Plane. This documentation has been removed from ISO/IEC 10646:2011, and the term
> UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either
> 10646 or the Unicode Standard.

I agree that UCS-2 no longer represents the current Unicode range, but 
it still is a true UCS-4 subset (BMP).

The UCS standards define Unicode as ranges of values, while the UTF 
standards define encodings.

The UTF-7/8 encodings are purely numerical compression schemes, while 
UTF-16 (with surrogate pairs) more reflects a tree-like structure of 
"planes", "groups", "blocks", "codepages" etc., favored by the Unicode 
Consortium. Such a view may be interesting to font writers, which can 
restrict an font to part of the full Unicode range, but is of little 
help with handling Unicode programmatically.

DoDi