[Lazarus] UTF8String and UTF8Delete

Sven Barth pascaldragon at googlemail.com
Sat Dec 12 18:20:37 CET 2015


On 12.12.2015 12:46, Jürgen Hestermann wrote:
> Am 2015-12-11 um 19:14 schrieb Sven Barth:
>  > Windows uses multi byte strings (one byte per character or more)
>  > and UTF-16 (which is mostly 2 Byte and 4 for surrogate pairs).
>  > The functions WideCharToMultiByte and MultiByteToWideChar which
>  > are also used inside FPC for string conversions both take a
>  > CodePage parameter that can also be CP_UTF8.
>
> As far as I know, (current) Windows versions only use UTF16 internally.
> But it provides the old legacy ANSI functions too (which convert to UTF16).
> MultiByteToWideChar and WideCharToMultiByte are just helper
> functions to convert from arbitrary encodings to UTF16 (and back).
> But UTF-8 is nowhere used internaly in Windows (not even ANSI anymore,
> except the legacy functions which convert to and from UTF16) and
> you cannot use UTF8 as string encoding for WIN API functions.
> Otherwise we would not have this problem and could use UTF-8 as
> a standard for everything.

Yes, internally Windows uses UTF-16, but if you set your Windows Ansi 
code page or at least the current thread's locale to UTF-8 (indirectly 
by choosing a locale that has UTF-8 as code page, I don't know one right 
now though) then the *A functions *do* work with UTF-8, simply because 
they use the current locale's code page to convert from Ansi to Unicode 
and in this case Ansi includes UTF-8.

Regards,
Sven





More information about the Lazarus mailing list