[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Hans-Peter Diettrich DrDiettrich1 at aol.com
Thu Dec 26 12:28:45 CET 2013


Sven Barth schrieb:
> Am 26.12.2013 02:19 schrieb "Hans-Peter Diettrich" <DrDiettrich1 at aol.com 
> <mailto:DrDiettrich1 at aol.com>>:
>  >
>  > Sven Barth schrieb:
>  >
>  >
>  >> If in 2.6.2 your three strings contain text of different encodings 
> then the resulting string might be garbage from the user's POV.
>  >> In trunk the encoding is part of each string and if they differ then 
> each strings will be converted to the default string encoding (defined 
> by a global variable inside unit System) and thus the string might still 
> be valid.
>  >
>  >
>  > If so, this flaw should be fixed immediately. Delphi uses lossless 
> conversions, i.e. an up-cast to Unicode.
> 
> No it does not. If the variables you concatenate are AnsiString and the 
> variable or parameter you pass them to is AnsiString as well (AFAIK it 
> even needs to be RawByteString) then the strings are converted to the 
> system encoding before they are concatenated and passed. This is 
> implemented Delphi compatible in FPC.

Please specify "AnsiString", of which encoding?

When I concat an AnsiString and an UTF8String and assign it to an OEMString
   o := a + u;
then I get these warnings in XE:

[DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 
'AnsiString' to 'string'
[DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 
'UTF8String' to 'string'
[DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with 
potential data loss from 'string' to 'OEMString'

I cannot see the system codepage used here.


What I want to point out are the string function overloads, where Delphi 
supplies only string (UTF-16) and RawByteString arguments, and 
AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String 
overloads and use these when dealing with AnsiStrings of an encoding 
different from CP_ACP.

>  > Such problems can be avoided by making RawByteString a compiler 
> magic, that enforces a Unciode conversion whenever AnsiStrings of a 
> different dynamic encoding have to be combined.
> 
> RawByteString is already as magical as it gets and exactly is what's on 
> the tin: a raw byte string. No automatic conversions ever. This is a 
> type that is needed for implementing String handling in RTL so 
> overloading it with another meaning will only result on problems.
> If you want UTF-8 encoded strings then use UTF8String. Period.

Please understand that the use of RawByteString in Delphi can lead to 
strings with wrong encoding. This type should not be available for 
declaring variables, only for parameters and function results. This 
restriction requires compiler magic.


> UnicodeString is *defined* as 2-Byte character reference counted string. 
> There will be no change there.

Sorry, I meant the generic string type.

DoDi





More information about the Lazarus mailing list