[Lazarus] UTF16 2 utf8
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Thu May 5 16:40:38 CEST 2011
José Mejuto schrieb:
> I think that the text that says the UCS2 has been extended, does not
> means that UCS2 has been extended, it says that UCS2 has been extended
> to UTF-16, so UCS2 can not be considered Unicode anymore as noted in
> ISO 10646:
>
> UCS-2. UCS-2 stands for �Universal Character Set coded in 2 octets� and is also known as
> �the two-octet BMP form.� It was documented in earlier editions of 10646 as the two-octet
> (16-bit) encoding consisting only of code positions for plane zero, the Basic Multilingual
> Plane. This documentation has been removed from ISO/IEC 10646:2011, and the term
> UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either
> 10646 or the Unicode Standard.
I agree that UCS-2 no longer represents the current Unicode range, but
it still is a true UCS-4 subset (BMP).
The UCS standards define Unicode as ranges of values, while the UTF
standards define encodings.
The UTF-7/8 encodings are purely numerical compression schemes, while
UTF-16 (with surrogate pairs) more reflects a tree-like structure of
"planes", "groups", "blocks", "codepages" etc., favored by the Unicode
Consortium. Such a view may be interesting to font writers, which can
restrict an font to part of the full Unicode range, but is of little
help with handling Unicode programmatically.
DoDi
More information about the Lazarus
mailing list