[Lazarus] Does Lazarus support a complete Unicode Component Library?

Sun Jan 2 12:47:28 CET 2011

On 01.01.2011 22:29, Juha Manninen wrote:
> Vladimir Zhirov kirjoitti lauantai 01 tammikuu 2011 22:14:32:
>> Sven Barth wrote:
>>> You need to convert the UTF8 string to a different one, e.g.
>>> UTF16:
>>>
>>> var
>>>     us: UnicodeString;
>>> begin
>>>     us := UTF8Encode(s);
>>> end;
>>>
>>> Now us[2] will return the a-umlaut.
>>
>> I would suggest using Utf8Copy(s, 2, 1) instead. It helps
>> to avoid conversion and works correctly even for characters
>> that take 4 bytes in UnicodeString/WideString (i.e. 2
>> wide characters). Utf8Copy is declared in LCLProc unit.
>
> So the conversion is only needed if a char inside the string is accessed by
> index?
>

If you use the LCL in your application you can also use the UTF8Copy 
which was mentioned by Vladimir.

Let's say it this way: if your String contains an UTF8 encoded text you 
should not use [] or the normal Pos, Copy, etc. functions, because they 
might return garbage. Use functions that can work with that encoding 
(either by converting the string or working directly on it).

> I understand the principle but I didn't understand how the functions
> UTF8Encode and UTF8Decode work. Of course I don't need to understand such
> details because I am not FPC developer but anyway ...
>
> UTF8Encode returns UTF8String and the AnsiString parameter is internally
> typecasted to UnicodeString. How can that work?
>

You looked at the wrong function. I meant the one below it which has a 
UnicodeString as argument. And this also solves the mystery:

Casting from AnsiString to UnicodeString invokes the WideString 
Manager's Ansi2UnicodeMoveProc which converts the supplied AnsiString to 
a correct UTF16 string. Then the function which takes an UnicodeString 
as argument is invoked (it's an overloaded function after all) and the 
UTF16 string is converted to UTF8.

> Maybe Sven's example should use UTF8Decode. It returns UnicodeString.
> According to debugger both functions convert the string to uppercase and add
> some garbage to the beginning and end, but it may be debugger error.

Yes, it should have used UTF8Decode. I used the wrong function. -.-

Regards,
Sven