<p>Am 07.05.2017 12:07 schrieb "Florian Klaempfl via Lazarus" <<a href="mailto:lazarus@lists.lazarus-ide.org">lazarus@lists.lazarus-ide.org</a>>:<br>

><br>

> Am 07.05.2017 um 11:57 schrieb Graeme Geldenhuys via Lazarus:<br>

> > On 2017-05-07 09:10, Florian Klaempfl via Lazarus wrote:<br>

> >>> Yeah, that would be the logical thing to do.<br>

> >><br>

> >> Why? What makes a string literal UTF-8?<br>

> >><br>

> ><br>

> > As Mattias said, the fact that the source unit is UTF-8 encoded.<br>

> > Defined by a BOM marker, or -Fcutf8 or {$codepage utf8}. If the source<br>

> > unit is UTF-8 encoded, the literal string constant can't (and<br>

> > shouldn't) be in any other encoding.<br>

> ><br>

> > I would say the same if the source unit was stored in UTF-16<br>

> > encoding. Then string literals would be treated as UTF-16.<br>

><br>

> And if a ISO/Ansi codepage is given? Things would probably fail.<br>

><br>

> The point is: FPC is consistent in this regard: also sources with a<br>

> given iso/ansi codepage are handled the same way. If there is a string<br>

> literal with non-ascii chars, it is converted to UTF-16 using the<br>

> codepage of the source. Very simple, very logical. It is a matter of<br>

> preference if UTF-8, -16, -32 are chosen at this point, but FPC uses<br>

> UTF-16. If it uses UTF-8, the problem would occur the other way around.<br>

><br>

> If no codepage is given (by directive, command line, BOM), string<br>

> literals are handled byte-wise as raw strings.</p>

<p>Small correction: FPC only does this conversion if the codepage is UTF-8, no other.</p>

<p>Regards,<br>

Sven</p>