[Lazarus] FPC 2.7.1 and console output

Hans-Peter Diettrich DrDiettrich1 at aol.com
Wed Dec 3 16:46:43 CET 2014


Mattias Gaertner schrieb:
> On Tue, 02 Dec 2014 22:41:04 +0100
> Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:
> 
>> [...]
>> I can see two major problems with the current FPC AnsiString model. The 
>> first problem is the strange FPC convention, that a string variable can 
>> have a different static/dynamic encoding, not only with RawByteString. 
>> That convention (flaw) can require an explicit SetCodePage for every 
>> string parameter, because a string argument of e.g. static type CP_OEM 
>> (for console output) can have any other actual (dynamic) encoding, not 
>> useful when passing the string to the external function.
> 
> The FPC sources need SetCodePage only in the RTL and only either for
> codepage conversion functions or for Default(RTL)FileSystemCodePage.
> It seems it is not a "major" problem for Lazarus users.

Let's see, currently I try to debug AnsiUpperCase, that doesn't seem to 
work.
Q: how do I debug the RTL (step into)?


>> The next problem results from the Delphi incompatible dynamic encoding 
>> of CP_ACP(=0), that seems to be used when a literal is stored in an 
>> AnsiString. These strings have the encoding assumed at *compile time*, 
>> perhaps from a {$codepage ...} switch, which can differ from the 
>> DefaultSystemCodepage at *runtime*. Then the conversion routines assume 
>> the the string is encoded according to DefaultSystemCodepage, what's not 
>> necessarily true:
>>
>> var
>>    A: AnsiString;
>> begin
>>    a := ' äöü';
>>    WriteLn('CP_ACP=',DefaultSystemCodePage);
>>    WriteLn('Ansi CP=',StringCodePage(a),' Len=',Length(a),' ="',a,'"');
>> end.
>>
>> Reports (on Windows) CP_ACP=1252, string CP=0, and due to the Lazarus 
>> File Encoding of UTF-8 the string literal and variable contains UTF-8 
>> (Len=7), as assumed by the compiler. The attempt in WriteLn, to convert 
>> the string to CP_OEM from encoding 0, mapped by TranslatePlaceholderCP 
>> into DefaultSystemCodePage (=1252 at runtime), results in a conversion 
>> of the UTF-8 bytes from CP 1252 into CP_OEM :-(
> 
> I described two ways in my other mail how to handle that.

I don't want want workarounds for a flawed FPC implementation, I want an 
FPC working on Windows without hacks.


> About the example:
> Writeln on the Windows Console requires the console codepage and is
> therefore limited to characters of this codepage.

That's perfectly sufficient for my tests.

> If your code contains
> literals for a specific Windows codepage then you are limiting
> yourself to that codepage (not x-platform). That is your choice.
> OTOH Lazarus main target is x-platform programs. For example the
> UTF8ToConsole solution works on Unix too, while your CP1252
> example does not.

What's CP1252 specific in my example?

> With FPC 2.7.1 there is a new possibility.

Please note that I *am* using and writing about FPC 2.7.1.

> With the new UTF-8 mode
> your example gives:
> 
> CP_ACP=65001
> Ansi CP=0 Len=7 =" äöü"
> 
> This works on Unix too, while the CP1252 example does not.
> Under Windows it works if the console codepage contains "äöü" (which
> can be more than one codepage). Basically the compiler adds the
> UTF8ToConsole for you.

This works only for a DefaultSystemCodePage of UTF-8, see your CP_ACP 
encoding shown above :-(

If this doesn't change, the string encodings are quite useless, and a 
single AnsiString type of fixed encoding CP_UTF8 were sufficient (and 
faster, due to omitted string conversions). Windows users may not like 
that, some prefer to use the default Windows codepage or UTF_16 instead 
(Delphi compatible).


>> [...]
>> Delphi string literals instead come with their true dynamic encoding, which 
>> never can be 0, and thus can be assigned and shown properly. Above code 
>> then will show CP=1252 and Len=4 for the AnsiString variable.
> 
> No, it should show garbage and Len=7, because the source is UTF-8,
> while the compiler treats it as your system codepage.

Well, I tested my program with XE, with the default Windows textfile 
encoding. When FPC or Lazarus has problems with such a program file, 
then something is flawed :-(

DoDi





More information about the Lazarus mailing list