[Lazarus] Converting all code to use UnicodeString

Mon Sep 25 23:10:04 CEST 2017

On 25.09.2017 22:18, Marcos Douglas B. Santos via Lazarus wrote:
> Hi Sven,
> First of all, thanks for your time to answer me.
> 
> On Mon, Sep 25, 2017 at 4:43 PM, Sven Barth via Lazarus
> <lazarus at lists.lazarus-ide.org> wrote:
>> On 25.09.2017 20:51, Marcos Douglas B. Santos via Lazarus wrote:
>>> I understand use IFDEF to compile in different platforms like Windows
>>> vs... err... Haiku. Of Linux vs Nintendo Wii...
>>> But why should I use IFDEF in a code that should be the same in both
>>> compilers (FPC vs Delphi)?
>>
>> Because they *aren't* the same. In Delphi String = UnicodeString while
>> in the RTL, the FCL and the LCL String = AnsiString(CP_ACP) and using a
>> different modeswitch *does not* change that, cause modes are unit specific.
> 
> Yes, but using {$modeswitch unicodestrings}, at least in a certain
> unit, should work with the same code between compilers because
> "string", for that unit, is UnicodeString as Delphi string is, no?

Yes, but it does not change the types of functions, classes, etc. that
are used. They have the types they were compiled with while you are
using a different string type. So you can't simply override a virtual
method for example that has a String argument that is in fact a
AnsiString with a method that has a String that's a UnicodeString as
argument. So of course there will be warnings in case you're passing
UnicodeString variables to AnsiString variables.

>> Especially the RTL is not ready for String = UnicodeString. So your best
>> bet is to use UTF8String or set the default code page to UTF8 (the LCL
>> units do that by default if I remember correctly, but Ondrej can confirm
>> or deny that).
> 
> Yes, Lazarus do that by default. But did you see in my examples, at
> the first email, how many inconsistencies I got, using just Lazarus
> and change chars in one simple constant?

Note: I'll ignore the GUI example, cause Ondrej might be better for that.

For the console you need to keep in mind that the console - at least on
Windows - has a code page as well. On my Linux - which is set to UTF-8 -
your example works without any problem, but if I use Wine I get the same
output as you.

>>> It will be slower than now? Yes, maybe... but we already use objects!
>>> If you want 500% performance, use pointers, records and procedures
>>> with whatever encode you want. But if you use objects, the overhead
>>> already exists... and who cares? 1ms... 2ms... even 2s that you may
>>> lost using UTF16? (or UTF8, but make all equal!) So? The world is
>>> using Ruby and they don't care... or Python, Java... and they store in
>>> UTF16 too, which requires a double of space... but if it works and the
>>> code is clean, should be more important, don't agree?
>>
>> For FPC also more restricted targets are to be kept in mind (AVR, DOS,
>> etc.).
> 
> I know almost nothing about compilers. But IMHO, the compiler should
> have which it already have: "string", which is an alias.
> Then, for each OS, we should pass one argument like (simplifying):
> -S=UnicodeString  or -S=AnsiString... something like that (I hope you
> understood).

The compiler is not the problem. It's that especially the low level part
of the RTL needs to be aware of the String type and handle it correctly.
Essentially all functions will need to be checked whether they can
correctly handle String (as in the generic string type) or are specific
for AnsiString and thus would need to be adjusted.

> I mean, we should not have overload functions, but only one type of
> string. Even if that type may be RawByteString.

You are wrong. Think about functions reading or writing data from/to
files. Especially when the data was written with the other String type
in mind.

> 
> After compiled, we will have a RTL that will work follow the "-S" argument.
> 
>> So the RTL will be adjusted in a way that it can be easily
>> compiled with String = UnicodeString or as is now with String =
>> AnsiString(CP_ACP). But we are not there yet.
> 
> Now we're talking.
> Almost everyone that know how to work with "the group of strings",
> making them compatible between FPC and Delphi, are saying that Unicode
> is already done and everything is fine. You are the first one to say
> that is not complete yet. Thank you. I'm glad to know that I'm not
> crazy.

Unicode itself is working, but in the form of UTF-8, not UTF-16 and as
such it is as compatible to Delphi as it can currently get with some
caveats when the specific type is important.

Regards,
Sven