[Lazarus] Removed use of UTF8String in Lazarus to work with cpstrnew

Hans-Peter Diettrich DrDiettrich1 at aol.com
Mon Sep 19 14:05:31 CEST 2011


Paul Ishenin schrieb:

>> Why not apply the same to AnsiString and change all to String since
>> Lazarus does not work with Ansi code pages anyway?
> 
> Lazarus works with strings which have 1 byte per element. If FPC later 
> switch default string type to UnicodeString Lazarus will suddenly get 
> many problems.

The choice of UTF-8 (for Delphi Ansi) strings is the first 
incompatibility. Shouldn't we cure it, by following the new Delphi 
Unicode model? Otherwise another string-type incompatibility is added to 
the string-encoding incompatibility :-(

When FPC starts to dictate inappropriate rules[1], I see no way around a 
fork into an Ansi/UTF-8 and an UTF-16/Unicode branch[2], according to 
the break in Delphi. This would mean that the old branch has to stick 
with an older compiler (current release), and the new branch requires 
the new compiler.

[1] The FPC developers currently try to find a model that fits both 
needs, compatibility with Delphi *and* Lazarus - we'll have to wait for 
its outcome.

[2] IMO it should be possible to separate user-land strings from 
platform/widgetset strings. Then all components can continue to use 
UTF-8 internally (in talking to the widgetsets), while the user 
accessible strings can be of another type. With a proper choice of that 
internal boundary, the number of excess conversions can be kept at a 
minimum, as well as the required changes to the LCL code.


>> For example, if UTF8ToUTF16 was left to accept UTF8String I would
>> think it would force the parameter to have UTF-8 code page, which
>> would be more correct. And this is what I don't understand, how will
>> it break when UTF8String is left.
> 
> Compiler adds implicit codepage conversion for string arguments. I had 
> to avoid that. The better choise would be to use RawByteString type but 
> I it is not defined in fpc 2.4.4 which we need to support.

IMO the use of RawByteString will not help much, except for (possibly) 
simpler code and less overloaded procedures. Avoiding implicit 
conversions instead will require *fixed* string types and encodings, for 
different tasks with different needs. E.g. a TFileName string type will 
allow to eliminate all conversions, when a string is known to hold file 
or path names (by design). Likewise an LCLString (widget, component) 
type could do the same for the LCL widgetset interface. The FPC 
decisions about string container classes (TStrings...) will tell where 
to put the break line, between user and widget string types.

DoDi





More information about the Lazarus mailing list