[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Marcos Douglas md at delfire.net
Thu Dec 19 22:58:18 CET 2013


On Wed, Dec 18, 2013 at 3:25 PM, Juha Manninen
<juha.manninen62 at gmail.com> wrote:
> On Wed, Dec 18, 2013 at 5:19 PM, Marcos Douglas <md at delfire.net> wrote:
>> Here too, more or less... I'm thinking to switch all own packages to UTF-8.
>> But, in your codes, how do you works on Delphi -- or with Lazarus on
>> Windows -- using your core parts? There are many calls from/to
>> SysToUTF8 and/or UTF8ToSys from core to Windows?
>
> If you need to call WinAPI, then you must convert obviously.
> In our case API calls are not needed by the core program. It is
> cross-platform code. However using a new Unicode-Delphi would cause
> many problems because all VCL functions and classes, including
> TStringList, expect UTF-16 string. When using UTF8String, the compiler
> converts between encodings all the time.

Using UTF8String the compiler converts to UTF-16 automatically?

> UTF-8 is needed in many places, thus we would need to duplicate much
> of VCL code for UTF-8. No good.
> Using UTF-8 with FPC/Lazarus would simplify the task. LCL classes and
> functions work as expected etc.
> I was even presented a possibility of doing a hybrid Ansi/UTF-8 system
> and a gradual data conversion plan.
> If Lenght(s) = UTF8Lenght(s), then the string is an AnsiString, and so on...

I thought that, e.g., override the RTL classes and functions:
type
  TStringList = class(Classes.TStringList)
    // using UTF-8
  end

> If you call WinAPI a lot, then with UTF-8 you must convert encodings.
> But, if you are calling WinAPI a lot, then you are in trouble anyway.

I disagree if you code a program only to run on Windows.  ;-)

> As Michael Van Canneyt wrote, backwards compatibility with UTF-8 is
> good. For example all our lower-ascii data will work without
> conversions.
> Also lots of code which is not designed for Unicode, will continue to
> work with UTF-8 but not with UTF-16. For example parsers for common
> markup languages (HTML, XML, BB-code) still magically work because all
> tags are in lower-ascii area.

I agree.

Marcos Douglas




More information about the Lazarus mailing list