[Lazarus] Feature Request: Insert {codepage UTF8} per default

Bart bartjunk64 at gmail.com
Wed Mar 30 18:16:32 CEST 2016


On 3/30/16, Juha Manninen <juha.manninen62 at gmail.com> wrote:

> Do your files have UTF-8 encoding? It is a necessity for the Unicode
> system to work.

Yes, all my code is either from Lazarus or from my own editor (which
is a synedit).

> Any valid UTF-8 string should work, including diacritics.
Without the codepage identier?

Quote from http://wiki.freepascal.org/FPC_Unicode_support#String_constants:
"Normally, a string constant is interpreted according to the source
file codepage. If the source file codepage is CP_ACP, a default is
used instead: in that case, during conversions the constant strings
are assumed to have code page 28591 (ISO 8859-1 Latin 1; Western
European). "
...
"From the above it follows that to ensure predictable interpretation
of string constants in your source code, it is best to either include
an explicit {$codepage xxx} directive (or use the equivalent -Fc
command line option), or to save the source code in UTF-8 with a BOM.
"

AFAIK the IDE does not save the file with a BOM, so the compiler may
very well decide that my sourcefile has ACP codepage?

Consider this test sourcefile (encoded as UTF8 without BOM):

const
  TestUtf8 = 'ÄAÄ';

begin
  writeln('DefaultSystemcodePage = ',DefaultSystemcodePage);
  writeln('TestUtf8 = ',StrToHex(TestUtf8));
  s1 := TestUtf8;
  writeln('S1       = ',StrToHex(S1),' [',StringCodePage(S1),']');
  writeln(S1); //will trigger outmatic codepage conversion to
console's codepage when needed
end.

Without {$codepage utf8} it outputs:
DefaultSystemcodePage = 1252
TestUtf8 = $C3 $84 $41 $C3 $84
S1       = $C3 $84 $41 $C3 $84 [0]
Ã"AÃ"

The compiler treats my source as if it were written in my system's codepage.
With cp1552 S1 now contains garbage (Ã"AÃ"). (at least not what I
expected it to be)

With the proper {$codepage utf8} inserted it will output:
DefaultSystemcodePage = 1252
TestUtf8 = $C3 $84 $41 $C3 $84
S1       = $C3 $84 $41 $C3 $84 [65001]
ÄAÄ

I would say that this experiment contradicts the statement in
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals
?

Bart




More information about the Lazarus mailing list