[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Fri Dec 27 09:57:07 CET 2013
Sven Barth schrieb:
> On 26.12.2013 17:02, Sven Barth wrote:
>> Am 26.12.2013 12:30 schrieb "Hans-Peter Diettrich" <DrDiettrich1 at aol.com
>> <mailto:DrDiettrich1 at aol.com>>:
>> >
>> > Sven Barth schrieb:
>> >>
>> >> Am 26.12.2013 02:19 schrieb "Hans-Peter Diettrich"
>> <DrDiettrich1 at aol.com <mailto:DrDiettrich1 at aol.com>
>> <mailto:DrDiettrich1 at aol.com <mailto:DrDiettrich1 at aol.com>>>:
>> > Please specify "AnsiString", of which encoding?
>> >
>> > When I concat an AnsiString and an UTF8String and assign it to an
>> OEMString
>> > o := a + u;
>> > then I get these warnings in XE:
>> >
>> > [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
>> 'AnsiString' to 'string'
>> > [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
>> 'UTF8String' to 'string'
>> > [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with
>> potential data loss from 'string' to 'OEMString'
>> >
>> > I cannot see the system codepage used here.
>>
>> Try to make o of type RawByteString. And maybe also use more than two
>> strings.
As I already statet: RawByteString is not for application use!
> Ok, I didn't remember the situation correctly. When searching for Jonas'
> mail I mentioned below I also found this which I was referring to:
>
> === quote of Jonas begin ===
>
> var
> mypath: utf8string;
> sr: tsearchrec;
> begin
> { assign some utf-8 string to mypath }
> if findfirst(mypath+allfilesmask,faAnyFile,sr)=0 then
> begin
> ...
> end;
> end
Delphi has no problem with this code, because all strings are upgraded
to UnicodeString.
> If the DefaultSystemCodePage is something different from UTF-8, the
> result of "mypath+allfilesmask" will be downgraded to
> DefaultSystemCodePage because the string constant "allfilesmask" is
> encoded using that code page.
Delphi has no rule of "downgrading".
When mypath+allfilesmask is assigned to a variable, the result has the
correct encoding, not necessarily CP_ACP.
> This is due to rule that "concatenating
> ansistrings with different encodings results in an ansistring with the
> encoding of the destination ansistring" is followed, and the destination
> ansistring is a rawbytestring here (the first argument of findfirst), in
> which case the ansi encoding is used.
Again: RawByteString is a mess, should be used with care.
The first argument of FindFirst (file mask) certainly *can not* be a
RawByteString.
> === quote of Jonas end ===
>
>> >
>> >
>> > What I want to point out are the string function overloads, where
>> Delphi supplies only string (UTF-16) and RawByteString arguments, and
>> AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String
>> overloads and use these when dealing with AnsiStrings of an encoding
>> different from CP_ACP.
>>
>> That was already discussed some time ago between devs and was deemed not
>> useable by Jonas. I'll try to find his mail with his explanation.
>
> === quote of Jonas begin ===
>
> Adding explicitly named UTF-8 versions of routines with constant or value
> rawbytestring arguments (FindFirstUTF8 etc) with UTF8String arguments and
> that internally simply call through to the rawbytestring versions could
> perhaps be useful. Interestingly, Lazarus users probably won't suffer
> from this particular problem as they already use such routines from the
> LCL, and those routines can simply be adapted by simply removing all the
> UTF8ToSys calls (they will keep working in their current state though,
> they simply keep suffering from the same data loss issues they had
> before).
>
> === quote of Jonas end ===
I see no argument for or against UTF-8 overloads here.
> Please note that Jonas states that different named overloads would be
> needed. Equally named UTF8String overloads won't necessarily work
> correctly.
You see the need for making RawByteString a compiler magic? :-]
It should be used only as the last resort, when no other string type
matches a given string encoding.
As for FindFirst, a choice of the mask string exists only on Windows,
depending on the use of the A or W API. Other targets have an dedicated
encoding for filenames, that should be used in all file and directory
functions. Even on Windows only the W API should be used nowadays; the A
API (as used in older Delphi versions) was only for support of legacy
Win9x systems, where not all W subroutine versions were available.
DoDi
More information about the Lazarus
mailing list