[Lazarus] UTF8 RTL for Windows
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Tue Nov 25 13:10:26 CET 2014
Mattias Gaertner schrieb:
> On Mon, 24 Nov 2014 22:15:29 +0100
> Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:
>
>> [...]
>> The Delphi (and FPC) encoding model allows for strings of different
>> static (declared) and dynamic (true content) encoding, see the special
>> handling of RawByteString (Wiki).
>>
>> So far it's not a good idea to simply *assume* that a string variable
>> contains bytes of the declared encoding. In detail one should check or
>> force the right dynamic encoding of every string variable, before
>> searching for specific bytes (chars) in it.
>>
>> I'm missing documentation for working safely (and efficiently) with such
>> irregular strings, most probably none of the FPC (and Delphi) developers
>> ever noticed how users are left alone with this problem :-(
>
> Maybe I don't understand the question, but it seems to me this is
> documented where static-, dynamic cp and rawbytestring are explained.
More concrete questions:
How can a user be sure that a string parameter in a subroutine has the
specified encoding?
How to check, how to fix if needed?
>
> http://wiki.freepascal.org/FPC_Unicode_support#Ansistring
>
> When a procedure requires a specific encoding it uses a specific String
> type. If it works with CP_ACP it uses "String". If it needs UTF8 it
> uses UTF8String.
Such specifications are meaningless when the string parameters can have
a different dynamic encoding :-(
Unicode Delphi works well as long as only one codepage (CP_ACP) is used,
in addition to Unicode (UTF-16) strings. As soon as multiple codepages
can be involved at the same time, the dynamic string encodings become
almost random (observed in Delphi XE). FPC now already has multiple
built-in codepage variables (DefaultSystemCodePage...), with possibly
different values, so that the observed Delphi mess is inevitable, as
long as RawByteString results (of e.g. standard stringhandling
functions) are *not* converted when assigned to a string variable of
some specific static encoding.
Unfortunately I cannot test Lazarus trunk since a long time, no answer
on my request for assistance. So I have to wait for the next installable
download, before I can give concrete examples.
DoDi
More information about the Lazarus
mailing list