[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Sven Barth pascaldragon at googlemail.com
Thu Dec 26 09:36:21 CET 2013


Am 26.12.2013 02:19 schrieb "Hans-Peter Diettrich" <DrDiettrich1 at aol.com>:
>
> Sven Barth schrieb:
>
>
>> If in 2.6.2 your three strings contain text of different encodings then
the resulting string might be garbage from the user's POV.
>> In trunk the encoding is part of each string and if they differ then
each strings will be converted to the default string encoding (defined by a
global variable inside unit System) and thus the string might still be
valid.
>
>
> If so, this flaw should be fixed immediately. Delphi uses lossless
conversions, i.e. an up-cast to Unicode.

No it does not. If the variables you concatenate are AnsiString and the
variable or parameter you pass them to is AnsiString as well (AFAIK it even
needs to be RawByteString) then the strings are converted to the system
encoding before they are concatenated and passed. This is implemented
Delphi compatible in FPC.

> Such problems can be avoided by making RawByteString a compiler magic,
that enforces a Unciode conversion whenever AnsiStrings of a different
dynamic encoding have to be combined.

RawByteString is already as magical as it gets and exactly is what's on the
tin: a raw byte string. No automatic conversions ever. This is a type that
is needed for implementing String handling in RTL so overloading it with
another meaning will only result on problems.
If you want UTF-8 encoded strings then use UTF8String. Period.

>
> Furthermore the use of UTF-8 will allow for lossless conversions of
AnsiStrings of any encoding, with the result still being an AnsiString.
Here Delphi has the problem that a RawByteString result type requires a
conversion of an intermediate Unicode string (UTF-16) into an
AnsiString(CP_ACP), with possible losses. This is not required when FPC
treats UTF-8 as a fully supported encoding, in addition to CP_ACP - it also
were a strong argument for using UTF-8 for UnicodeString, *instead* of
UTF-16. The related functions already exist in the FPC libraries, they only
have to take precedence over CP_ACP (if different). Then additional
UTF-8/16 conversions are required only on Windows, when calling external
(API...) functions which expect/return WideStrings.

UnicodeString is *defined* as 2-Byte character reference counted string.
There will be no change there.

Regards,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20131226/a403d439/attachment-0003.html>


More information about the Lazarus mailing list