[Lazarus] Feature Request: Insert {codepage UTF8} per default

Mattias Gaertner nc-gaertnma at netcologne.de
Thu Mar 31 12:44:07 CEST 2016

On Thu, 31 Mar 2016 00:16:13 +0200
"Michael W. Vogel" <m-w-vogel at gmx.de> wrote:

> I've tested the example too and I got different results with different 
> options. The test was:
> - BOM / no BOM at the beginning of the sourcefile
> - {$codepage UTF8} or not

The compiler understands -FcUTF8, {$codepage utf8}
and BOM. All three sets UTF-8. See here:

BOM has the advantage that it is understood by other text editors as
well and the disadvantage that it is hidden, so that people unaware
of encodings are easily confused.

-FcUTF8 has the advantage of applying it to all sources in the
project/package and it can easily be turned off. You can unset it for a
single unit via {$modeswitch systemcodepage}.

> - fpc -MObjFPC *-Sh* test.pas (with / without -Sh (use reference counted 
> strings))

And this is where the confusion starts. Mixing multiple string
types is asking for troubles. FPC has an impressive (aka frightening)
list of string types and consequently a vast net of combinations that
only graph theorists can appreciate.

> So it is realy more complex as I thought...

And you have not yet explored the difficulties in code supporting
both FPC 2.6.4 and 3+ and LCL 1.4 and 1.6.

Although Lazarus recommends to "simply" use UTF-8,
technically it recommends AnsiString, DefaultSystemCodepage CP_UTF8, no
explicit codepage, and the UTF-8 functions in LazUtils.
If you need to use other string types in an unit you might want to add
an explicit codepage. Maybe a paragraph should be added to the wiki
about using non AnsiString with the "Lazarus UTF-8".


More information about the Lazarus mailing list