[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Hans-Peter Diettrich DrDiettrich1 at aol.com
Fri Dec 27 10:16:56 CET 2013


Juha Manninen schrieb:
> It happened again. The word "Unicode" was mentioned and the result is
> an endless debate of how it should be done. Now > 100 messages and
> counting ...

Now that we are in pre-release of strings with Encoding, the debate 
enters a very new round.

> I personally don't care much what the default encoding will be, but I
> wonder how easy it will be to use UTF-8 for my employer's code.
> The situation with FPC will be better than with Delphi because FPC
> does not convert automatically to default encoding ALWAYS. It only
> converts when the conversion is needed.
> For example TStringList can be used for UTF8Strings and it does not
> trigger automatic conversion.
> Isn't it so? Please correct me if I still got it wrong.

That's the old state, where strings have no stored Encoding. As soon as 
AnsiStrings have an encoding, the default encoding becomes important for 
the reduction of automatic conversions. When the RTL is converted to 
UTF-16, you'll have to accept either this new default encoding, or any 
number of automatic conversions between Ansi and UnicodeStrings.

> It means UTF-8 with FPC will be easier than UTF-8 with Delphi, even if
> UTF-16 was the default.

Delphi suffers from the use of CP_ACP, which was the only supported 
encoding before, and still is the only explicitly supported encoding 
when the AnsiString unit is used. In Lazarus we had the same "only one 
encoding" philosophy, except that here the default string type is UTF-8. 
With the encoded AnsiStrings the problem of other encodings and 
automatic conversion arises. Delphi solved most problems by changing 
"string" to UTF-16, so that only the forced used of AnsiString will ever 
result in automatic conversions due to different string encodings.

In FPC/Lazarus the situation is somewhat different, because now the 
default string type could be UTF-8, UTF-16 or even CP_ACP, with a number 
of users voting for each of them. Technically the simplest solution 
would be to keep the de-facto standard UTF-8, as assumed by Lazarus. But 
when "string" becomes UTF-16, as in recent Delphi versions, Lazarus and 
the LCL deserves heavy refactoring. That's the top discussion topic 
right now.

DoDi





More information about the Lazarus mailing list