[Lazarus] Does Lazarus support a complete Unicode Component Library?
Vladimir Zhirov
vvzh.lists at gmail.com
Sat Jan 1 23:51:42 CET 2011
Juha Manninen wrote:
> So the conversion is only needed if a char inside the string
> is accessed by index?
No, the conversion is completely optional.
Summing up what was suggested, there are two ways to access character
by index in UTF-8 string:
1. Convert it to WideString/UnicodeString and use MyWideString[Index];
2. Use Utf8Copy(MyString, Index, 1);
The limitation of the first approach is that it relies on the fact that the character fits in 2 bytes
(WideChar). As a result, it works wrong for characters of some languages and some special symbols
(see http://en.wikipedia.org/wiki/Supplementary_Multilingual_Plane#Supplementary_Multilingual_Plane
for the list of them). So this approach does not support "true" unicode, but works in most cases.
The second approach should handle this right (provided there is no bugs).
> UTF8Encode returns UTF8String and the AnsiString parameter is
> internally typecasted to UnicodeString. How can that work?
>
> Maybe Sven's example should use UTF8Decode.
Sure, UTF8Decode should have been used in this case.
More information about the Lazarus
mailing list