[Lazarus] UTF8 RTL for Windows

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Nov 25 13:10:26 CET 2014


Mattias Gaertner schrieb:
> On Mon, 24 Nov 2014 22:15:29 +0100
> Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:
> 
>> [...]
>> The Delphi (and FPC) encoding model allows for strings of different 
>> static (declared) and dynamic (true content) encoding, see the special 
>> handling of RawByteString (Wiki).
>>
>> So far it's not a good idea to simply *assume* that a string variable 
>> contains bytes of the declared encoding. In detail one should check or 
>> force the right dynamic encoding of every string variable, before 
>> searching for specific bytes (chars) in it.
>>
>> I'm missing documentation for working safely (and efficiently) with such 
>> irregular strings, most probably none of the FPC (and Delphi) developers 
>> ever noticed how users are left alone with this problem :-(
> 
> Maybe I don't understand the question, but it seems to me this is
> documented where static-, dynamic cp and rawbytestring are explained.

More concrete questions:

How can a user be sure that a string parameter in a subroutine has the 
specified encoding?
How to check, how to fix if needed?


> 
> http://wiki.freepascal.org/FPC_Unicode_support#Ansistring
> 
> When a procedure requires a specific encoding it uses a specific String
> type. If it works with CP_ACP it uses "String". If it needs UTF8 it
> uses UTF8String.

Such specifications are meaningless when the string parameters can have 
a different dynamic encoding :-(

Unicode Delphi works well as long as only one codepage (CP_ACP) is used, 
in addition to Unicode (UTF-16) strings. As soon as multiple codepages 
can be involved at the same time, the dynamic string encodings become 
almost random (observed in Delphi XE). FPC now already has multiple 
built-in codepage variables (DefaultSystemCodePage...), with possibly 
different values, so that the observed Delphi mess is inevitable, as 
long as RawByteString results (of e.g. standard stringhandling 
functions) are *not* converted when assigned to a string variable of 
some specific static encoding.

Unfortunately I cannot test Lazarus trunk since a long time, no answer 
on my request for assistance. So I have to wait for the next installable 
download, before I can give concrete examples.

DoDi





More information about the Lazarus mailing list