[Lazarus] Removed use of UTF8String in Lazarus to work with cpstrnew

Paul Ishenin ip at kmiac.ru
Mon Sep 19 11:35:34 CEST 2011


19.09.2011 17:06, cobines wrote:
> 2011/9/19 Paul Ishenin<webpirat at mail.ru>:
>> Lazarus must use either UTF8String everywhere or nowhere.
> I thought that Lazarus will continue to use UTF-8 on all platforms and
> String will mean String<65001>  and it will be interchangeable with
> UTF8String. Is that not so?

Will it use or won't it does not matter. If in one place you use 
UTF8String and in other AnsiString the compiler will convert that 
UTF8String in one place to default system codepage in another. For 
example on my windows it converted UTF8 string to 1251 codepage. Having 
all strings in default codepage prevents from automatic conversion.

> Why change UTF8String to AnsiString and not String, like almost every
> other string parameter?

The argument was UTF8String. UTF8String previosly was declared as type 
AnsiString in RTL. Why should I choose a "string" type which depends on 
compiler switches and source code modes.

> Why not apply the same to AnsiString and change all to String since
> Lazarus does not work with Ansi code pages anyway?

Lazarus works with strings which have 1 byte per element. If FPC later 
switch default string type to UnicodeString Lazarus will suddenly get 
many problems.

> For example, if UTF8ToUTF16 was left to accept UTF8String I would
> think it would force the parameter to have UTF-8 code page, which
> would be more correct. And this is what I don't understand, how will
> it break when UTF8String is left.

Compiler adds implicit codepage conversion for string arguments. I had 
to avoid that. The better choise would be to use RawByteString type but 
I it is not defined in fpc 2.4.4 which we need to support.

Best regards,
Paul Ishenin.





More information about the Lazarus mailing list