[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Hans-Peter Diettrich DrDiettrich1 at aol.com
Fri Dec 27 09:57:07 CET 2013


Sven Barth schrieb:
> On 26.12.2013 17:02, Sven Barth wrote:
>> Am 26.12.2013 12:30 schrieb "Hans-Peter Diettrich" <DrDiettrich1 at aol.com
>> <mailto:DrDiettrich1 at aol.com>>:
>>  >
>>  > Sven Barth schrieb:
>>  >>
>>  >> Am 26.12.2013 02:19 schrieb "Hans-Peter Diettrich"
>> <DrDiettrich1 at aol.com <mailto:DrDiettrich1 at aol.com>
>> <mailto:DrDiettrich1 at aol.com <mailto:DrDiettrich1 at aol.com>>>:
>>  > Please specify "AnsiString", of which encoding?
>>  >
>>  > When I concat an AnsiString and an UTF8String and assign it to an
>> OEMString
>>  >   o := a + u;
>>  > then I get these warnings in XE:
>>  >
>>  > [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
>> 'AnsiString' to 'string'
>>  > [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
>> 'UTF8String' to 'string'
>>  > [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with
>> potential data loss from 'string' to 'OEMString'
>>  >
>>  > I cannot see the system codepage used here.
>>
>> Try to make o of type RawByteString. And maybe also use more than two
>> strings.

As I already statet: RawByteString is not for application use!


> Ok, I didn't remember the situation correctly. When searching for Jonas' 
> mail I mentioned below I also found this which I was referring to:
> 
> === quote of Jonas begin ===
> 
> var
>   mypath: utf8string;
>   sr: tsearchrec;
> begin
>   { assign some utf-8 string to mypath }
>   if findfirst(mypath+allfilesmask,faAnyFile,sr)=0 then
>     begin
>       ...
>     end;
> end

Delphi has no problem with this code, because all strings are upgraded 
to UnicodeString.

> If the DefaultSystemCodePage is something different from UTF-8, the 
> result of "mypath+allfilesmask" will be downgraded to 
> DefaultSystemCodePage because the string constant "allfilesmask" is 
> encoded using that code page.

Delphi has no rule of "downgrading".

When mypath+allfilesmask is assigned to a variable, the result has the 
correct encoding, not necessarily CP_ACP.


> This is due to rule that "concatenating 
> ansistrings with different encodings results in an ansistring with the 
> encoding of the destination ansistring" is followed, and the destination 
> ansistring is a rawbytestring here (the first argument of findfirst), in 
> which case the ansi encoding is used.

Again: RawByteString is a mess, should be used with care.

The first argument of FindFirst (file mask) certainly *can not* be a 
RawByteString.

> === quote of Jonas end ===
> 
>>  >
>>  >
>>  > What I want to point out are the string function overloads, where
>> Delphi supplies only string (UTF-16) and RawByteString arguments, and
>> AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String
>> overloads and use these when dealing with AnsiStrings of an encoding
>> different from CP_ACP.
>>
>> That was already discussed some time ago between devs and was deemed not
>> useable by Jonas. I'll try to find his mail with his explanation.
> 
> === quote of Jonas begin ===
> 
> Adding explicitly named UTF-8 versions of routines with constant or value
> rawbytestring arguments (FindFirstUTF8 etc) with UTF8String arguments and
> that internally simply call through to the rawbytestring versions could
> perhaps be useful.  Interestingly, Lazarus users probably won't suffer
> from this particular problem as they already use such routines from the
> LCL, and those routines can simply be adapted by simply removing all the
> UTF8ToSys calls (they will keep working in their current state though,
> they simply keep suffering from the same data loss issues they had
> before).
> 
> === quote of Jonas end ===

I see no argument for or against UTF-8 overloads here.


> Please note that Jonas states that different named overloads would be 
> needed. Equally named UTF8String overloads won't necessarily work 
> correctly.

You see the need for making RawByteString a compiler magic? :-]
It should be used only as the last resort, when no other string type 
matches a given string encoding.

As for FindFirst, a choice of the mask string exists only on Windows, 
depending on the use of the A or W API. Other targets have an dedicated 
encoding for filenames, that should be used in all file and directory 
functions. Even on Windows only the W API should be used nowadays; the A 
API (as used in older Delphi versions) was only for support of legacy 
Win9x systems, where not all W subroutine versions were available.

DoDi





More information about the Lazarus mailing list