[Lazarus] Converting all code to use UnicodeString

Marcos Douglas B. Santos md at delfire.net
Mon Sep 25 23:35:02 CEST 2017


On Mon, Sep 25, 2017 at 6:10 PM, Sven Barth via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> On 25.09.2017 22:18, Marcos Douglas B. Santos via Lazarus wrote:
>> [...]
>> Yes, but using {$modeswitch unicodestrings}, at least in a certain
>> unit, should work with the same code between compilers because
>> "string", for that unit, is UnicodeString as Delphi string is, no?
>
> Yes, but it does not change the types of functions, classes, etc. that
> are used. They have the types they were compiled with while you are
> using a different string type. So you can't simply override a virtual
> method for example that has a String argument that is in fact a
> AnsiString with a method that has a String that's a UnicodeString as
> argument. So of course there will be warnings in case you're passing
> UnicodeString variables to AnsiString variables.

I saw that many RTL functions have an overload like this:
Function FileExists (Const FileName : RawByteString) : Boolean;
Function FileExists (Const FileName : UnicodeString) : Boolean;

The first one calls the second:
Function FileExists (Const FileName : RawByteString) : Boolean;
begin
  Result:=FileExists(UnicodeString(FileName));
end;

My question is:
No matter the encode of FileName: RawByteString is, if I cast to
UnicodeString I will not have any loss of characters?

>> Yes, Lazarus do that by default. But did you see in my examples, at
>> the first email, how many inconsistencies I got, using just Lazarus
>> and change chars in one simple constant?
>
> Note: I'll ignore the GUI example, cause Ondrej might be better for that.

No problem.

> For the console you need to keep in mind that the console - at least on
> Windows - has a code page as well. On my Linux - which is set to UTF-8 -
> your example works without any problem, but if I use Wine I get the same
> output as you.

Ok, but the compiler knows if a program is a CLI, I believe... so, it
could change those variables DefaultSystemCodePage,
DefaultFileSystemCodePage...
For users (developers) is not clear, do you agree?

>>[...]
>> I know almost nothing about compilers. But IMHO, the compiler should
>> have which it already have: "string", which is an alias.
>> Then, for each OS, we should pass one argument like (simplifying):
>> -S=UnicodeString  or -S=AnsiString... something like that (I hope you
>> understood).
>
> The compiler is not the problem. It's that especially the low level part
> of the RTL needs to be aware of the String type and handle it correctly.
> Essentially all functions will need to be checked whether they can
> correctly handle String (as in the generic string type) or are specific
> for AnsiString and thus would need to be adjusted.

I see...

>> I mean, we should not have overload functions, but only one type of
>> string. Even if that type may be RawByteString.
>
> You are wrong. Think about functions reading or writing data from/to
> files. Especially when the data was written with the other String type
> in mind.

It is normal that external data (files) to have different encodes.
IMO, only in these cases, we should care about encoding, because an
external data is outside of our code, we cannot control it.

>> After compiled, we will have a RTL that will work follow the "-S" argument.
>>
>>> So the RTL will be adjusted in a way that it can be easily
>>> compiled with String = UnicodeString or as is now with String =
>>> AnsiString(CP_ACP). But we are not there yet.
>>
>> Now we're talking.
>> Almost everyone that know how to work with "the group of strings",
>> making them compatible between FPC and Delphi, are saying that Unicode
>> is already done and everything is fine. You are the first one to say
>> that is not complete yet. Thank you. I'm glad to know that I'm not
>> crazy.
>
> Unicode itself is working, but in the form of UTF-8, not UTF-16 and as
> such it is as compatible to Delphi as it can currently get with some
> caveats when the specific type is important.

Well, I only setted {mode delphi} and {modeswitch unicodestrings} and
I did not leave Lazarus and still got strange results... looks like
FPC flags is not compatible with itself or Lazarus.
Again, I know that you, Mattias and many others understand that
perfectly. But my examples were very simple, but they didn't work
perfectly using just FPC and Lazarus.

Regards,
Marcos Douglas


More information about the Lazarus mailing list