[Lazarus] FPC 2.7.1 and console output

Tue Dec 2 22:41:04 CET 2014

Mattias Gaertner schrieb:
> On Tue, 02 Dec 2014 11:21:43 +0100
> Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:

>> On my first steps with the Unicode RTL I found a different behaviour, 
>> when a console program is compiled from an PAS file, and from an LPI/LPR 
>> project. [on WinXP, Ansi source files]
> 
> What do you mean with "compiled from an PAS file"?

I wrote my first test programs with NotePad, and called FPC from the 
commandline. These files also compile okay with Lazarus, when loaded as 
projects.

Next I tried FP, but couldn't figure out how to configure it at all, for 
using the trunk compiler :-(

Finally I managed to build Lazarus trunk, and that's where I'm now.
When creating a project in Lazarus, the default file encoding (now?) 
seems to be UTF-8 without BOM, at least I had no problems with older 
versions - maybe because I only used ASCII in my programs, and rarely 
wrote console programs.

So how can I convince Lazarus to use (and assume) source files of my 
(non-VCL) projects being Windows codepage (1252)?

> You can see what parameters the IDE passed to the compiler. Right click
> on the messages (e.g. the "Compile Project..."), then "About Compiler
> Project...".

Then I need an wizard that explains me the consequences of (not) using 
any possible switch :-(

> Have you set any flags in the "Configure Build Lazarus" dialog?

Not yet, I had a hard time until I could build and configure the current 
Lazarus trunk version. It looks to me as if the last FPC *release* 
version is required to build Lazarus, while I want to use the *trunk* 
version for compiling my own projects. If so, how do I tell all that to 
Lazarus?

> What do you mean with "string output"?

WriteLn to the console.

> LCL and debugln expect UTF-8, writeln needs DefaultSystemCodePage in
> FPC 2.7.1 and file is a broad category.

All that should become unimportant with the Unicode RTL, when every API 
stub expects/returns strings of the according encoding, or converts its 
arguments to the right encoding. Every file object (TFileStream...) has 
a settable encoding, under full control of the coder.

I can see two major problems with the current FPC AnsiString model. The 
first problem is the strange FPC convention, that a string variable can 
have a different static/dynamic encoding, not only with RawByteString. 
That convention (flaw) can require an explicit SetCodePage for every 
string parameter, because a string argument of e.g. static type CP_OEM 
(for console output) can have any other actual (dynamic) encoding, not 
useful when passing the string to the external function.

The next problem results from the Delphi incompatible dynamic encoding 
of CP_ACP(=0), that seems to be used when a literal is stored in an 
AnsiString. These strings have the encoding assumed at *compile time*, 
perhaps from a {$codepage ...} switch, which can differ from the 
DefaultSystemCodepage at *runtime*. Then the conversion routines assume 
the the string is encoded according to DefaultSystemCodepage, what's not 
necessarily true:

var
   A: AnsiString;
begin
   a := ' äöü';
   WriteLn('CP_ACP=',DefaultSystemCodePage);
   WriteLn('Ansi CP=',StringCodePage(a),' Len=',Length(a),' ="',a,'"');
end.

Reports (on Windows) CP_ACP=1252, string CP=0, and due to the Lazarus 
File Encoding of UTF-8 the string literal and variable contains UTF-8 
(Len=7), as assumed by the compiler. The attempt in WriteLn, to convert 
the string to CP_OEM from encoding 0, mapped by TranslatePlaceholderCP 
into DefaultSystemCodePage (=1252 at runtime), results in a conversion 
of the UTF-8 bytes from CP 1252 into CP_OEM :-(

Even if the string A is assigned to an AnsiString(CP_OEMCP), that string 
has a dynamic encoding of 437, but contains the 7 UTF-8 bytes :-(

This mess could be eliminated easily, when FPC would store the *true* 
compile-time encoding with all string literals, not CP_ACP(=0). Delphi 
string literals instead come with their true dynamic encoding, which 
never can be 0, and thus can be assigned and shown properly. Above code 
then will show CP=1252 and Len=4 for the AnsiString variable. What's 
missing from XE is the automatic conversion on console output into 
CP_OEM, as done by FPC.
The use of TranslatePlaceholderCP() then also could be reduced to the 
rare cases, where *user* code is allowed to supply an encoding value 
(TSystemCodePage) separate from an string, e.g. in the AnsiString type 
declarations or SetCodePage.

As an ugly hack the coder can use {$codepage UTF8}, *and* force Lazarus 
to store the source files accordingly, and hope that then the literals 
have a useful dynamic encoding. This *is* an hack, because it makes the 
source files almost unusable with other programs (editors...) on Windows 
and other platforms, which don't have UTF-8 for their default encoding 
(CP_ACP). I also wonder what will happen on platforms with a default 
encoding of CP_UTF8, when the user is allowed to and then changes that 
default codepage into something else, for his entire system or an (FPC) 
program.

>> When I try {$codepage 1252}, I get an error during compilation ("file 
>> not open")? The "charset" unit, mentioned in help on $codepage, doesn't 
>> give any clues about allowed values. What's the correct value for 
>> Windows default (western)?
> 
> {$codepage cp1252}
> 
>> Or how else can I establish a global default sourcecode codepage?
> 
> If you mean with "global" your project: Add -FcCP1252 to the custom
> compiler options.

Thanks :-)
What does this switch mean, e.g. to my source files?

DoDi