[Lazarus] Feature Request: Insert {codepage UTF8} per default

Bart bartjunk64 at gmail.com
Thu Mar 31 14:32:27 CEST 2016


On 3/31/16, Mattias Gaertner <nc-gaertnma at netcologne.de> wrote:

>> AFAIK the IDE does not save the file with a BOM, so the compiler may
>> very well decide that my sourcefile has ACP codepage?
>
> Yes and no.
> When the compiler assumes ACP, it treats the string special. It does
> not convert it and stores it as byte copy. At runtime the string has
> CP_ACP and its codepage is defined by the variable
> DefaultSystemCodePage. LazUTF8 sets this to CP_UTF8, so the string is
> treated as UTF-8. Note that it does that without any conversion.
>
> OTOH when you tell the compiler that the source is UTF-8, it converts
> the literal to UTF-16. At runtime it converts the string back to UTF-8.
> It does that everytime you assign the literal.
>
> So, with both you get an UTF-8 string, but the latter has a bit more
> overhead. Also the latter needs special care when typecasting (e.g.
> PChar).

So, when my usecase for string constants with diacritics in real life
most of the time is just captions for buttons/menu's etc., the extra
overhead will not really be something to worry about I guess,and in
this scenario adding {$codepage utf8} may be the wise thing to do: it
eliminates all confusion about the intended encoding of the string
constant.

So, my current intended approach for GUI applications will be:
- declare all strings as just String
- have stringconstants with unicode character all in one file and add
{$codepage utf8) to that file, and then don't use -FcUTF8 anymore
(which is what I'm doing ATM),

That should be rather safe then I guess.

Will all this mess go away if we would go the Delphi way (String=UnicodeString)?
(I know *nix users are going to hate me now)

Bart




More information about the Lazarus mailing list