[Lazarus] Utf8ToSys on Linux and cwstring in uses clause

Vladimir Zhirov vvzh.lists at gmail.com
Fri Nov 12 18:44:12 CET 2010


I've just tried to use Utf8ToConsole on my Linux box and was
surprised very much about the result.

If I run a simple program like this:
> program project1;
> {$mode objfpc}{$H+}
> uses
>   FileUtil;
> begin
>   WriteLn(UTF8ToConsole('Text in Russian:'));
>   WriteLn(UTF8ToConsole('Текст на русском'));
> end.

It produces the following output:
> Text in Russian:
> ????? ?? ???????

My LANG environment variable is ru_RU.UTF-8
If I add cwstring to my uses clause it works OK.

Trying to figure out what happens I noticed an inconsistency
in FileUtil.NeedRTLAnsi function. The comment at line 186 says
NeedRTLAnsi is "true if system encoding is not UTF-8", but the
function itself contains the following code:
> FNeedRTLAnsi:=(SysUtils.CompareText(Encoding,'UTF-8')=0)
>              or (SysUtils.CompareText(Encoding,'UTF8')=0);

So it looks like the reverse: NeedRTLAnsi is true if system
encoding IS utf-8. This causes redundant Utf8ToAnsi call in
Utf8ToSys that turns non-ASCII text into question marks in the
absence of widestring manager (cwstring).

Is it a bug in NeedRTLAnsi? If it is, the fix would be trivial:
> FNeedRTLAnsi:=(SysUtils.CompareText(Encoding,'UTF-8')<>0)
>             and (SysUtils.CompareText(Encoding,'UTF8')<>0);
With this change everything works as expected, at least for me.
Should I also report this to mantis in this case?

Or is it expected behavior of NeedRTLAnsi and just a misprint in
the comment? In this case should I always use cwstring and bear
with libc/iconv dependency?

Thanks in advance.

More information about the Lazarus mailing list