[Lazarus] FPC 2.7.1 and console output
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Wed Dec 3 16:46:43 CET 2014
Mattias Gaertner schrieb:
> On Tue, 02 Dec 2014 22:41:04 +0100
> Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:
>
>> [...]
>> I can see two major problems with the current FPC AnsiString model. The
>> first problem is the strange FPC convention, that a string variable can
>> have a different static/dynamic encoding, not only with RawByteString.
>> That convention (flaw) can require an explicit SetCodePage for every
>> string parameter, because a string argument of e.g. static type CP_OEM
>> (for console output) can have any other actual (dynamic) encoding, not
>> useful when passing the string to the external function.
>
> The FPC sources need SetCodePage only in the RTL and only either for
> codepage conversion functions or for Default(RTL)FileSystemCodePage.
> It seems it is not a "major" problem for Lazarus users.
Let's see, currently I try to debug AnsiUpperCase, that doesn't seem to
work.
Q: how do I debug the RTL (step into)?
>> The next problem results from the Delphi incompatible dynamic encoding
>> of CP_ACP(=0), that seems to be used when a literal is stored in an
>> AnsiString. These strings have the encoding assumed at *compile time*,
>> perhaps from a {$codepage ...} switch, which can differ from the
>> DefaultSystemCodepage at *runtime*. Then the conversion routines assume
>> the the string is encoded according to DefaultSystemCodepage, what's not
>> necessarily true:
>>
>> var
>> A: AnsiString;
>> begin
>> a := ' äöü';
>> WriteLn('CP_ACP=',DefaultSystemCodePage);
>> WriteLn('Ansi CP=',StringCodePage(a),' Len=',Length(a),' ="',a,'"');
>> end.
>>
>> Reports (on Windows) CP_ACP=1252, string CP=0, and due to the Lazarus
>> File Encoding of UTF-8 the string literal and variable contains UTF-8
>> (Len=7), as assumed by the compiler. The attempt in WriteLn, to convert
>> the string to CP_OEM from encoding 0, mapped by TranslatePlaceholderCP
>> into DefaultSystemCodePage (=1252 at runtime), results in a conversion
>> of the UTF-8 bytes from CP 1252 into CP_OEM :-(
>
> I described two ways in my other mail how to handle that.
I don't want want workarounds for a flawed FPC implementation, I want an
FPC working on Windows without hacks.
> About the example:
> Writeln on the Windows Console requires the console codepage and is
> therefore limited to characters of this codepage.
That's perfectly sufficient for my tests.
> If your code contains
> literals for a specific Windows codepage then you are limiting
> yourself to that codepage (not x-platform). That is your choice.
> OTOH Lazarus main target is x-platform programs. For example the
> UTF8ToConsole solution works on Unix too, while your CP1252
> example does not.
What's CP1252 specific in my example?
> With FPC 2.7.1 there is a new possibility.
Please note that I *am* using and writing about FPC 2.7.1.
> With the new UTF-8 mode
> your example gives:
>
> CP_ACP=65001
> Ansi CP=0 Len=7 =" äöü"
>
> This works on Unix too, while the CP1252 example does not.
> Under Windows it works if the console codepage contains "äöü" (which
> can be more than one codepage). Basically the compiler adds the
> UTF8ToConsole for you.
This works only for a DefaultSystemCodePage of UTF-8, see your CP_ACP
encoding shown above :-(
If this doesn't change, the string encodings are quite useless, and a
single AnsiString type of fixed encoding CP_UTF8 were sufficient (and
faster, due to omitted string conversions). Windows users may not like
that, some prefer to use the default Windows codepage or UTF_16 instead
(Delphi compatible).
>> [...]
>> Delphi string literals instead come with their true dynamic encoding, which
>> never can be 0, and thus can be assigned and shown properly. Above code
>> then will show CP=1252 and Len=4 for the AnsiString variable.
>
> No, it should show garbage and Len=7, because the source is UTF-8,
> while the compiler treats it as your system codepage.
Well, I tested my program with XE, with the default Windows textfile
encoding. When FPC or Lazarus has problems with such a program file,
then something is flawed :-(
DoDi
More information about the Lazarus
mailing list