[Lazarus] Case error with cirilyc

dmitry boyarintsev skalogryz.lists at gmail.com
Fri Sep 18 06:22:52 CEST 2009


--- with UTF8Scanner ----

If you're using UTF8Scanner, then you should use UTF8 encoded files.
You can enable UTF8 encoding the same way, you've enabled Ansi encoding.

But, since you're in Windows, FPC compiler treats all your sources as
Ansi, (even if they're UTF8Encoded).
You should add {$codepage UTF8} before 'interface' section, so the
compiler knows that UTF8 is used.

--- without UTF8Scanner ----

On Thu, Sep 17, 2009 at 7:46 PM, Rigel Rigel <rigel at gbg.bg> wrote:
>  It works with lat symbol on the right but if on the right of CASE is cyrillic symbol in Memo2.text shows '?' instead correct symbol.

You're getting '?' character, because LCL is expecting UTF8 Encoded
string, while, you're filling stOut with ANSI characters.
The code should be the following

procedure TForm1.Button1Click(Sender: TObject);
var
  stInp: String;
  stOut: String;
  i: integer;
begin
 stOut := '';
 stInp := UTF8ToAnsi(Memo1.Lines.Text); { convert the string UTF8
string to Ansi for Case processing }
 for i := 1 to length(stInp) do
 case stInp[i] of // ANSI case processing
 'а': stOut := stOut + 'а'; //Cir --> in Memo2 shows '?'
 'б': stOut := stOut + 'б'; // --> in Memo2 shows '?'
 'a': stOut := stOut + 'a'; //Lat --> ok
 'b': stOut := stOut + 'b'; // --> ok
 End;
 {stOut is ANSI encoded, LCL expects UTF8Strings, so you need to encode it}
 Memo2.Lines.Text := UTF8Encode(stOut);
end;

There's no explicit difference between Ansi and UTF8 strings.  It's up
to you, to know, about strings encoding.
Keep in mind, that standard LCL component are ALWAYS using UTF8 encoded strings.
For the delphi compatibility you might need to convert them to Ansi
strings, but you must convert ansi strings back to UTF8 if you want
assign an ANSI string to LCL component.

Also, if you're using Ansi encoded file, you need to convert string
constants to the UTF8 as well, for example

Label1.Caption := UTF8Encode('язык ада');
because Label1 as standard LCL component, expects UTF8 string.


The whole "mess" with UTF8/Ansi strings, is about code portability,
between platforms.

thanks,
dmitry




More information about the Lazarus mailing list