[Lazarus] Feature Request: Insert {codepage UTF8} per default
Bart
bartjunk64 at gmail.com
Wed Mar 30 18:16:32 CEST 2016
On 3/30/16, Juha Manninen <juha.manninen62 at gmail.com> wrote:
> Do your files have UTF-8 encoding? It is a necessity for the Unicode
> system to work.
Yes, all my code is either from Lazarus or from my own editor (which
is a synedit).
> Any valid UTF-8 string should work, including diacritics.
Without the codepage identier?
Quote from http://wiki.freepascal.org/FPC_Unicode_support#String_constants:
"Normally, a string constant is interpreted according to the source
file codepage. If the source file codepage is CP_ACP, a default is
used instead: in that case, during conversions the constant strings
are assumed to have code page 28591 (ISO 8859-1 Latin 1; Western
European). "
...
"From the above it follows that to ensure predictable interpretation
of string constants in your source code, it is best to either include
an explicit {$codepage xxx} directive (or use the equivalent -Fc
command line option), or to save the source code in UTF-8 with a BOM.
"
AFAIK the IDE does not save the file with a BOM, so the compiler may
very well decide that my sourcefile has ACP codepage?
Consider this test sourcefile (encoded as UTF8 without BOM):
const
TestUtf8 = 'ÄAÄ';
begin
writeln('DefaultSystemcodePage = ',DefaultSystemcodePage);
writeln('TestUtf8 = ',StrToHex(TestUtf8));
s1 := TestUtf8;
writeln('S1 = ',StrToHex(S1),' [',StringCodePage(S1),']');
writeln(S1); //will trigger outmatic codepage conversion to
console's codepage when needed
end.
Without {$codepage utf8} it outputs:
DefaultSystemcodePage = 1252
TestUtf8 = $C3 $84 $41 $C3 $84
S1 = $C3 $84 $41 $C3 $84 [0]
Ã"AÃ"
The compiler treats my source as if it were written in my system's codepage.
With cp1552 S1 now contains garbage (Ã"AÃ"). (at least not what I
expected it to be)
With the proper {$codepage utf8} inserted it will output:
DefaultSystemcodePage = 1252
TestUtf8 = $C3 $84 $41 $C3 $84
S1 = $C3 $84 $41 $C3 $84 [65001]
ÄAÄ
I would say that this experiment contradicts the statement in
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#String_Literals
?
Bart
More information about the Lazarus
mailing list