[Lazarus] Does Lazarus support a complete Unicode Component Library?

Sven Barth pascaldragon at googlemail.com
Sat Jan 1 19:13:26 CET 2011


On 01.01.2011 15:25, Juha Manninen wrote:
>
>     It is the same as the new Delphi unicode support, I think. All GUI
>     components support unicode out of the box, it uses UTF8 encoded
>     strings.
>     AFAIK Delphi uses UTF16, so in that way it is different.
>
>
> I must ask a newbie question again. I never needed to pay attention to
> this because char encodings in GUIs have worked well for my purposes.
>
> The GUI text properties have type "string" which is ansistring with the
> normal H+ setting.
> TCaption is defined as "string", too.
> Examples: TEdit.Text, TMemo.Lines[0]
> What happens when I do:
>    var s: string;
>    ...
>    s := TMemo.Lines[0];
>
> Is it converted somehow?
> The native widget's encoding is either UTF-8 or UTF-16.
> Is the string actually a Utf8String or Utf16String then?
> When do I need to pay attention to it?

Currently there is no automatic conversion (it's planned in one of the 
branches of FPC). For now a String (=AnsiString) can be seen as an 
"array of byte". You as a developer are responsible that the string 
contains the correct encoding.

So in your above example the string that is stored in "s" will be UTF8 
encoded, because it comes from the GUI. But if that string contains 
multibyte characters those characters will appear as single "one byte" 
characters if you access the string using [], Pos, Copy, etc.

Example (note: this is not accurate UTF8 encoding, I'm just making that 
up here)

TMemo.Lines[0] contains: 'hä?!' ( h a-umlaut ? ! )
I now assume that an a-umlaut is encoded as "ae" (which isn't really the 
case, but it's for the sake of an example ^^)
s now contains: 'h a e ? !'

If you now want to access the second character of s you'd expect that 
you'd get the a-umlaut, but if you do s[2] you'll get an "a". And if you 
access the third one (s[3]) you'll get the "e" instead of "?".

You need to convert the UTF8 string to a different one, e.g. UTF16:

var
   us: UnicodeString;
begin
   us := UTF8Encode(s);
end;

Now us[2] will return the a-umlaut.

I hope this example clears that up a bit, if not: just ask more questions ;)

Regards,
Sven




More information about the Lazarus mailing list