[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Marcos Douglas md at delfire.net
Wed Dec 25 19:34:40 CET 2013


On Wed, Dec 25, 2013 at 3:15 PM, Marco van de Voort <marcov at stack.nl> wrote:
> On Tue, Dec 24, 2013 at 12:22:41PM -0200, Marcos Douglas wrote:
>> > IMO the biggest group are old fashioned Delphi (D7) users, which want their
>> > existing Ansi/VCL code base supported *without* complications and
>> > incompatibilities introduced by the newer Delphi versions. The subject of
>> > this thread clearly indicates that UTF-8 is *not* a solution for this group
>> > of users.
>>
>> I started this thread. My problem isn't to use UTF-8 on Windows... my
>> problem is use different encodings on the same code, ie, RTL <> LCL.
>
> Yes. But the selection of UTF8, and the legacy concerns with that are for
> Lazarus, and lazarus alone.
>
>> Use functions, always, to convert string between RTL and LCL and
>> vice-versa IHMO is wrong because the final code is confusing. In a
>> huge application you still need to think "here is UTF-8 or
>> ANSI/UTF-16?"
>
> There are many scenarios up in the sky, and nothing is 100% certain, but it
> would at least be significantly better. It is already significantly better
> in trunk.

When you say that is better in trunk is only on FPC context or there
are improvements for Lazarus users too?

> The only problem on Windows is that you must only pass a string with a very
> clear encoding to a RTL function.
>
> so
>
>  assignfile(f,s+s2+s3);
>
> is dangerous if they are not all the same encoding. If there is any
> mismatch, it will be converted down to default encoding.

Yes but where is the difference between 2.6.2 and trunk, in that case?

> It is defined, but somewhat special.
>
>> > That's my conclusion as well. But is that new audience worth to abandon the
>> > entire existing Lazarus audience?
>>
>> Of course nobody will abandon the entire existing Lazarus audience. If
>> the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues --
>> I think -- working using UTF-8.
>
> There is no utf8 on Windows. One can try to mess with the defaultcodepage,
> but that will probably only force a different kind of problems.
>
> On Windows there is only ansi or utf16, or keeping it manual.

You're right.
But if we imagine a perfect world that FPC and Lazarus use the same
encode -- doesn't matter if is UTF-8 or UTF-16 -- everything would
work. Do you agree?
So, if the encode chosen was UTF-8 for all, RTL only needs to decode
strings -- on Windows -- before to call API functions.  The same on
Linux (whatever platforms that uses UTF-8) if the encode chosen was
UTF-16.

My thinking is correct?


Marcos Douglas




More information about the Lazarus mailing list