[Lazarus] FPC 2.7.1 and console output

Mattias Gaertner nc-gaertnma at netcologne.de
Wed Dec 3 13:44:55 CET 2014


On Tue, 02 Dec 2014 22:41:04 +0100
Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:

>[...]
> I can see two major problems with the current FPC AnsiString model. The 
> first problem is the strange FPC convention, that a string variable can 
> have a different static/dynamic encoding, not only with RawByteString. 
> That convention (flaw) can require an explicit SetCodePage for every 
> string parameter, because a string argument of e.g. static type CP_OEM 
> (for console output) can have any other actual (dynamic) encoding, not 
> useful when passing the string to the external function.

The FPC sources need SetCodePage only in the RTL and only either for
codepage conversion functions or for Default(RTL)FileSystemCodePage.
It seems it is not a "major" problem for Lazarus users.

 
> The next problem results from the Delphi incompatible dynamic encoding 
> of CP_ACP(=0), that seems to be used when a literal is stored in an 
> AnsiString. These strings have the encoding assumed at *compile time*, 
> perhaps from a {$codepage ...} switch, which can differ from the 
> DefaultSystemCodepage at *runtime*. Then the conversion routines assume 
> the the string is encoded according to DefaultSystemCodepage, what's not 
> necessarily true:
> 
> var
>    A: AnsiString;
> begin
>    a := ' äöü';
>    WriteLn('CP_ACP=',DefaultSystemCodePage);
>    WriteLn('Ansi CP=',StringCodePage(a),' Len=',Length(a),' ="',a,'"');
> end.
> 
> Reports (on Windows) CP_ACP=1252, string CP=0, and due to the Lazarus 
> File Encoding of UTF-8 the string literal and variable contains UTF-8 
> (Len=7), as assumed by the compiler. The attempt in WriteLn, to convert 
> the string to CP_OEM from encoding 0, mapped by TranslatePlaceholderCP 
> into DefaultSystemCodePage (=1252 at runtime), results in a conversion 
> of the UTF-8 bytes from CP 1252 into CP_OEM :-(

I described two ways in my other mail how to handle that.

About the example:
Writeln on the Windows Console requires the console codepage and is
therefore limited to characters of this codepage. If your code contains
literals for a specific Windows codepage then you are limiting
yourself to that codepage (not x-platform). That is your choice.
OTOH Lazarus main target is x-platform programs. For example the
UTF8ToConsole solution works on Unix too, while your CP1252
example does not.

With FPC 2.7.1 there is a new possibility. With the new UTF-8 mode
your example gives:

CP_ACP=65001
Ansi CP=0 Len=7 =" äöü"

This works on Unix too, while the CP1252 example does not.
Under Windows it works if the console codepage contains "äöü" (which
can be more than one codepage). Basically the compiler adds the
UTF8ToConsole for you.


>[...]
> Delphi string literals instead come with their true dynamic encoding, which 
> never can be 0, and thus can be assigned and shown properly. Above code 
> then will show CP=1252 and Len=4 for the AnsiString variable.

No, it should show garbage and Len=7, because the source is UTF-8,
while the compiler treats it as your system codepage.

>[...]
> I also wonder what will happen on platforms with a default 
> encoding of CP_UTF8, when the user is allowed to and then changes that 
> default codepage into something else, for his entire system or an (FPC) 
> program.

Depends on what you do.

 
> > If you mean with "global" your project: Add -FcCP1252 to the custom
> > compiler options.
> 
> Thanks :-)
> What does this switch mean, e.g. to my source files?

Same as {$codepage}.
The IDE does not use this flag.

Mattias




More information about the Lazarus mailing list