[Lazarus] UTF8 RTL for Windows

Hans-Peter Diettrich DrDiettrich1 at aol.com
Mon Nov 24 22:15:29 CET 2014


luiz americo pereira camara schrieb:

> When DefaultSystemCodePage is CP_ACP the variable S will have the 
> content of UTF8 but the encoding will be ACP (in my case 1252), just 
> like is today.
> With DefaultSystemCodePage as CP_UTF8 both content and code page will match

The Delphi (and FPC) encoding model allows for strings of different 
static (declared) and dynamic (true content) encoding, see the special 
handling of RawByteString (Wiki).

So far it's not a good idea to simply *assume* that a string variable 
contains bytes of the declared encoding. In detail one should check or 
force the right dynamic encoding of every string variable, before 
searching for specific bytes (chars) in it.

I'm missing documentation for working safely (and efficiently) with such 
irregular strings, most probably none of the FPC (and Delphi) developers 
ever noticed how users are left alone with this problem :-(

DoDi





More information about the Lazarus mailing list