[Lazarus] Feature Request: Insert {codepage UTF8} per default

Michael W. Vogel m-w-vogel at gmx.de
Thu Mar 31 22:07:21 CEST 2016


Am 31.03.2016 um 12:44 schrieb Mattias Gaertner:
> On Thu, 31 Mar 2016 00:16:13 +0200
> "Michael W. Vogel" <m-w-vogel at gmx.de> wrote:
>
>> [...]
>> I've tested the example too and I got different results with different
>> options. The test was:
>> - BOM / no BOM at the beginning of the sourcefile
>> - {$codepage UTF8} or not
> The compiler understands -FcUTF8, {$codepage utf8}
> and BOM. All three sets UTF-8. See here:
> http://wiki.freepascal.org/FPC_Unicode_support#Source_file_codepage
>
> BOM has the advantage that it is understood by other text editors as
> well and the disadvantage that it is hidden, so that people unaware
> of encodings are easily confused.
>
> -FcUTF8 has the advantage of applying it to all sources in the
> project/package and it can easily be turned off. You can unset it for a
> single unit via {$modeswitch systemcodepage}.
>
>
>> - fpc -MObjFPC *-Sh* test.pas (with / without -Sh (use reference counted
>> strings))
> And this is where the confusion starts. Mixing multiple string
> types is asking for troubles. FPC has an impressive (aka frightening)
> list of string types and consequently a vast net of combinations that
> only graph theorists can appreciate.
>
>> So it is realy more complex as I thought...
> Yes.
> And you have not yet explored the difficulties in code supporting
> both FPC 2.6.4 and 3+ and LCL 1.4 and 1.6.
>
> Although Lazarus recommends to "simply" use UTF-8,
> technically it recommends AnsiString, DefaultSystemCodepage CP_UTF8, no
> explicit codepage, and the UTF-8 functions in LazUtils.
> If you need to use other string types in an unit you might want to add
> an explicit codepage. Maybe a paragraph should be added to the wiki
> about using non AnsiString with the "Lazarus UTF-8".
>
>   
> Mattias
>
Thank you very much, for your detailed answer!

I'll try to run some more tests, to understand why a BOM for UTF-8 has a 
other behaviour than a {$codepage UTF8}.

BTW the conversions here has nothing to do with Lazarus, it is only a 
FPC issue. If I don't find a answer for myself, I'll ask in the FPC 
mailing list.

Thanks again

Kind regards

Michl




More information about the Lazarus mailing list