[Lazarus] Does Lazarus support a complete Unicode Component Library?
Sven Barth
pascaldragon at googlemail.com
Sun Jan 2 12:47:28 CET 2011
On 01.01.2011 22:29, Juha Manninen wrote:
> Vladimir Zhirov kirjoitti lauantai 01 tammikuu 2011 22:14:32:
>> Sven Barth wrote:
>>> You need to convert the UTF8 string to a different one, e.g.
>>> UTF16:
>>>
>>> var
>>> us: UnicodeString;
>>> begin
>>> us := UTF8Encode(s);
>>> end;
>>>
>>> Now us[2] will return the a-umlaut.
>>
>> I would suggest using Utf8Copy(s, 2, 1) instead. It helps
>> to avoid conversion and works correctly even for characters
>> that take 4 bytes in UnicodeString/WideString (i.e. 2
>> wide characters). Utf8Copy is declared in LCLProc unit.
>
> So the conversion is only needed if a char inside the string is accessed by
> index?
>
If you use the LCL in your application you can also use the UTF8Copy
which was mentioned by Vladimir.
Let's say it this way: if your String contains an UTF8 encoded text you
should not use [] or the normal Pos, Copy, etc. functions, because they
might return garbage. Use functions that can work with that encoding
(either by converting the string or working directly on it).
> I understand the principle but I didn't understand how the functions
> UTF8Encode and UTF8Decode work. Of course I don't need to understand such
> details because I am not FPC developer but anyway ...
>
> UTF8Encode returns UTF8String and the AnsiString parameter is internally
> typecasted to UnicodeString. How can that work?
>
You looked at the wrong function. I meant the one below it which has a
UnicodeString as argument. And this also solves the mystery:
Casting from AnsiString to UnicodeString invokes the WideString
Manager's Ansi2UnicodeMoveProc which converts the supplied AnsiString to
a correct UTF16 string. Then the function which takes an UnicodeString
as argument is invoked (it's an overloaded function after all) and the
UTF16 string is converted to UTF8.
> Maybe Sven's example should use UTF8Decode. It returns UnicodeString.
> According to debugger both functions convert the string to uppercase and add
> some garbage to the beginning and end, but it may be debugger error.
Yes, it should have used UTF8Decode. I used the wrong function. -.-
Regards,
Sven
More information about the Lazarus
mailing list