[Lazarus] Feature Request: Insert {codepage UTF8} per default

Mattias Gaertner nc-gaertnma at netcologne.de
Thu Mar 31 12:44:07 CEST 2016


On Thu, 31 Mar 2016 00:16:13 +0200
"Michael W. Vogel" <m-w-vogel at gmx.de> wrote:

>[...]
> I've tested the example too and I got different results with different 
> options. The test was:
> - BOM / no BOM at the beginning of the sourcefile
> - {$codepage UTF8} or not

The compiler understands -FcUTF8, {$codepage utf8}
and BOM. All three sets UTF-8. See here:
http://wiki.freepascal.org/FPC_Unicode_support#Source_file_codepage

BOM has the advantage that it is understood by other text editors as
well and the disadvantage that it is hidden, so that people unaware
of encodings are easily confused.

-FcUTF8 has the advantage of applying it to all sources in the
project/package and it can easily be turned off. You can unset it for a
single unit via {$modeswitch systemcodepage}.


> - fpc -MObjFPC *-Sh* test.pas (with / without -Sh (use reference counted 
> strings))

And this is where the confusion starts. Mixing multiple string
types is asking for troubles. FPC has an impressive (aka frightening)
list of string types and consequently a vast net of combinations that
only graph theorists can appreciate.

> So it is realy more complex as I thought...

Yes. 
And you have not yet explored the difficulties in code supporting
both FPC 2.6.4 and 3+ and LCL 1.4 and 1.6.

Although Lazarus recommends to "simply" use UTF-8,
technically it recommends AnsiString, DefaultSystemCodepage CP_UTF8, no
explicit codepage, and the UTF-8 functions in LazUtils.
If you need to use other string types in an unit you might want to add
an explicit codepage. Maybe a paragraph should be added to the wiki
about using non AnsiString with the "Lazarus UTF-8".

 
Mattias




More information about the Lazarus mailing list