[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?
Sven Barth
pascaldragon at googlemail.com
Thu Dec 26 18:34:42 CET 2013
On 26.12.2013 17:02, Sven Barth wrote:
> Am 26.12.2013 12:30 schrieb "Hans-Peter Diettrich" <DrDiettrich1 at aol.com
> <mailto:DrDiettrich1 at aol.com>>:
> >
> > Sven Barth schrieb:
> >>
> >> Am 26.12.2013 02:19 schrieb "Hans-Peter Diettrich"
> <DrDiettrich1 at aol.com <mailto:DrDiettrich1 at aol.com>
> <mailto:DrDiettrich1 at aol.com <mailto:DrDiettrich1 at aol.com>>>:
> > Please specify "AnsiString", of which encoding?
> >
> > When I concat an AnsiString and an UTF8String and assign it to an
> OEMString
> > o := a + u;
> > then I get these warnings in XE:
> >
> > [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
> 'AnsiString' to 'string'
> > [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
> 'UTF8String' to 'string'
> > [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with
> potential data loss from 'string' to 'OEMString'
> >
> > I cannot see the system codepage used here.
>
> Try to make o of type RawByteString. And maybe also use more than two
> strings.
>
Ok, I didn't remember the situation correctly. When searching for Jonas'
mail I mentioned below I also found this which I was referring to:
=== quote of Jonas begin ===
var
mypath: utf8string;
sr: tsearchrec;
begin
{ assign some utf-8 string to mypath }
if findfirst(mypath+allfilesmask,faAnyFile,sr)=0 then
begin
...
end;
end
If the DefaultSystemCodePage is something different from UTF-8, the
result of "mypath+allfilesmask" will be downgraded to
DefaultSystemCodePage because the string constant "allfilesmask" is
encoded using that code page. This is due to rule that "concatenating
ansistrings with different encodings results in an ansistring with the
encoding of the destination ansistring" is followed, and the destination
ansistring is a rawbytestring here (the first argument of findfirst), in
which case the ansi encoding is used.
=== quote of Jonas end ===
> >
> >
> > What I want to point out are the string function overloads, where
> Delphi supplies only string (UTF-16) and RawByteString arguments, and
> AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String
> overloads and use these when dealing with AnsiStrings of an encoding
> different from CP_ACP.
>
> That was already discussed some time ago between devs and was deemed not
> useable by Jonas. I'll try to find his mail with his explanation.
=== quote of Jonas begin ===
Adding explicitly named UTF-8 versions of routines with constant or value
rawbytestring arguments (FindFirstUTF8 etc) with UTF8String arguments and
that internally simply call through to the rawbytestring versions could
perhaps be useful. Interestingly, Lazarus users probably won't suffer
from this particular problem as they already use such routines from the
LCL, and those routines can simply be adapted by simply removing all the
UTF8ToSys calls (they will keep working in their current state though,
they simply keep suffering from the same data loss issues they had
before).
=== quote of Jonas end ===
Please note that Jonas states that different named overloads would be
needed. Equally named UTF8String overloads won't necessarily work correctly.
Regards,
Sven
More information about the Lazarus
mailing list