[Lazarus] Converting all code to use UnicodeString

Sven Barth pascaldragon at googlemail.com
Mon Sep 25 21:43:04 CEST 2017


On 25.09.2017 20:51, Marcos Douglas B. Santos via Lazarus wrote:
> I understand use IFDEF to compile in different platforms like Windows
> vs... err... Haiku. Of Linux vs Nintendo Wii...
> But why should I use IFDEF in a code that should be the same in both
> compilers (FPC vs Delphi)?

Because they *aren't* the same. In Delphi String = UnicodeString while
in the RTL, the FCL and the LCL String = AnsiString(CP_ACP) and using a
different modeswitch *does not* change that, cause modes are unit specific.

> Is it because the string type is not Unicode? OK, so I want to convert
> all to use UTF16, ie, UnicodeString (wrong name) and make ALL code
> compatible. But this is looks like not possible without:
> 
> * IFDEFs
> * know a few {modes}
> * know what type of string I'm working on
> 
> 
> If there is an argument in the compiler to compile it with the
> definition of "all string is an UnicodeString like Java, C#, Delphi
> and all them", would be great.
> Then we will compile the compiler and Lazarus with the same type of
> string and everything will work.

Especially the RTL is not ready for String = UnicodeString. So your best
bet is to use UTF8String or set the default code page to UTF8 (the LCL
units do that by default if I remember correctly, but Ondrej can confirm
or deny that).

> It will be slower than now? Yes, maybe... but we already use objects!
> If you want 500% performance, use pointers, records and procedures
> with whatever encode you want. But if you use objects, the overhead
> already exists... and who cares? 1ms... 2ms... even 2s that you may
> lost using UTF16? (or UTF8, but make all equal!) So? The world is
> using Ruby and they don't care... or Python, Java... and they store in
> UTF16 too, which requires a double of space... but if it works and the
> code is clean, should be more important, don't agree?

For FPC also more restricted targets are to be kept in mind (AVR, DOS,
etc.). So the RTL will be adjusted in a way that it can be easily
compiled with String = UnicodeString or as is now with String =
AnsiString(CP_ACP). But we are not there yet.

Regards,
Sven


More information about the Lazarus mailing list