[Lazarus] Removed use of UTF8String in Lazarus to work with cpstrnew
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Mon Sep 19 14:05:31 CEST 2011
Paul Ishenin schrieb:
>> Why not apply the same to AnsiString and change all to String since
>> Lazarus does not work with Ansi code pages anyway?
>
> Lazarus works with strings which have 1 byte per element. If FPC later
> switch default string type to UnicodeString Lazarus will suddenly get
> many problems.
The choice of UTF-8 (for Delphi Ansi) strings is the first
incompatibility. Shouldn't we cure it, by following the new Delphi
Unicode model? Otherwise another string-type incompatibility is added to
the string-encoding incompatibility :-(
When FPC starts to dictate inappropriate rules[1], I see no way around a
fork into an Ansi/UTF-8 and an UTF-16/Unicode branch[2], according to
the break in Delphi. This would mean that the old branch has to stick
with an older compiler (current release), and the new branch requires
the new compiler.
[1] The FPC developers currently try to find a model that fits both
needs, compatibility with Delphi *and* Lazarus - we'll have to wait for
its outcome.
[2] IMO it should be possible to separate user-land strings from
platform/widgetset strings. Then all components can continue to use
UTF-8 internally (in talking to the widgetsets), while the user
accessible strings can be of another type. With a proper choice of that
internal boundary, the number of excess conversions can be kept at a
minimum, as well as the required changes to the LCL code.
>> For example, if UTF8ToUTF16 was left to accept UTF8String I would
>> think it would force the parameter to have UTF-8 code page, which
>> would be more correct. And this is what I don't understand, how will
>> it break when UTF8String is left.
>
> Compiler adds implicit codepage conversion for string arguments. I had
> to avoid that. The better choise would be to use RawByteString type but
> I it is not defined in fpc 2.4.4 which we need to support.
IMO the use of RawByteString will not help much, except for (possibly)
simpler code and less overloaded procedures. Avoiding implicit
conversions instead will require *fixed* string types and encodings, for
different tasks with different needs. E.g. a TFileName string type will
allow to eliminate all conversions, when a string is known to hold file
or path names (by design). Likewise an LCLString (widget, component)
type could do the same for the LCL widgetset interface. The FPC
decisions about string container classes (TStrings...) will tell where
to put the break line, between user and widget string types.
DoDi
More information about the Lazarus
mailing list