[Lazarus] FPC 2.7.1 and console output
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Tue Dec 2 22:41:04 CET 2014
Mattias Gaertner schrieb:
> On Tue, 02 Dec 2014 11:21:43 +0100
> Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:
>> On my first steps with the Unicode RTL I found a different behaviour,
>> when a console program is compiled from an PAS file, and from an LPI/LPR
>> project. [on WinXP, Ansi source files]
>
> What do you mean with "compiled from an PAS file"?
I wrote my first test programs with NotePad, and called FPC from the
commandline. These files also compile okay with Lazarus, when loaded as
projects.
Next I tried FP, but couldn't figure out how to configure it at all, for
using the trunk compiler :-(
Finally I managed to build Lazarus trunk, and that's where I'm now.
When creating a project in Lazarus, the default file encoding (now?)
seems to be UTF-8 without BOM, at least I had no problems with older
versions - maybe because I only used ASCII in my programs, and rarely
wrote console programs.
So how can I convince Lazarus to use (and assume) source files of my
(non-VCL) projects being Windows codepage (1252)?
> You can see what parameters the IDE passed to the compiler. Right click
> on the messages (e.g. the "Compile Project..."), then "About Compiler
> Project...".
Then I need an wizard that explains me the consequences of (not) using
any possible switch :-(
> Have you set any flags in the "Configure Build Lazarus" dialog?
Not yet, I had a hard time until I could build and configure the current
Lazarus trunk version. It looks to me as if the last FPC *release*
version is required to build Lazarus, while I want to use the *trunk*
version for compiling my own projects. If so, how do I tell all that to
Lazarus?
> What do you mean with "string output"?
WriteLn to the console.
> LCL and debugln expect UTF-8, writeln needs DefaultSystemCodePage in
> FPC 2.7.1 and file is a broad category.
All that should become unimportant with the Unicode RTL, when every API
stub expects/returns strings of the according encoding, or converts its
arguments to the right encoding. Every file object (TFileStream...) has
a settable encoding, under full control of the coder.
I can see two major problems with the current FPC AnsiString model. The
first problem is the strange FPC convention, that a string variable can
have a different static/dynamic encoding, not only with RawByteString.
That convention (flaw) can require an explicit SetCodePage for every
string parameter, because a string argument of e.g. static type CP_OEM
(for console output) can have any other actual (dynamic) encoding, not
useful when passing the string to the external function.
The next problem results from the Delphi incompatible dynamic encoding
of CP_ACP(=0), that seems to be used when a literal is stored in an
AnsiString. These strings have the encoding assumed at *compile time*,
perhaps from a {$codepage ...} switch, which can differ from the
DefaultSystemCodepage at *runtime*. Then the conversion routines assume
the the string is encoded according to DefaultSystemCodepage, what's not
necessarily true:
var
A: AnsiString;
begin
a := ' äöü';
WriteLn('CP_ACP=',DefaultSystemCodePage);
WriteLn('Ansi CP=',StringCodePage(a),' Len=',Length(a),' ="',a,'"');
end.
Reports (on Windows) CP_ACP=1252, string CP=0, and due to the Lazarus
File Encoding of UTF-8 the string literal and variable contains UTF-8
(Len=7), as assumed by the compiler. The attempt in WriteLn, to convert
the string to CP_OEM from encoding 0, mapped by TranslatePlaceholderCP
into DefaultSystemCodePage (=1252 at runtime), results in a conversion
of the UTF-8 bytes from CP 1252 into CP_OEM :-(
Even if the string A is assigned to an AnsiString(CP_OEMCP), that string
has a dynamic encoding of 437, but contains the 7 UTF-8 bytes :-(
This mess could be eliminated easily, when FPC would store the *true*
compile-time encoding with all string literals, not CP_ACP(=0). Delphi
string literals instead come with their true dynamic encoding, which
never can be 0, and thus can be assigned and shown properly. Above code
then will show CP=1252 and Len=4 for the AnsiString variable. What's
missing from XE is the automatic conversion on console output into
CP_OEM, as done by FPC.
The use of TranslatePlaceholderCP() then also could be reduced to the
rare cases, where *user* code is allowed to supply an encoding value
(TSystemCodePage) separate from an string, e.g. in the AnsiString type
declarations or SetCodePage.
As an ugly hack the coder can use {$codepage UTF8}, *and* force Lazarus
to store the source files accordingly, and hope that then the literals
have a useful dynamic encoding. This *is* an hack, because it makes the
source files almost unusable with other programs (editors...) on Windows
and other platforms, which don't have UTF-8 for their default encoding
(CP_ACP). I also wonder what will happen on platforms with a default
encoding of CP_UTF8, when the user is allowed to and then changes that
default codepage into something else, for his entire system or an (FPC)
program.
>> When I try {$codepage 1252}, I get an error during compilation ("file
>> not open")? The "charset" unit, mentioned in help on $codepage, doesn't
>> give any clues about allowed values. What's the correct value for
>> Windows default (western)?
>
> {$codepage cp1252}
>
>> Or how else can I establish a global default sourcecode codepage?
>
> If you mean with "global" your project: Add -FcCP1252 to the custom
> compiler options.
Thanks :-)
What does this switch mean, e.g. to my source files?
DoDi
More information about the Lazarus
mailing list