[Lazarus] UTF8 RTL for Windows

Michael Van Canneyt michael at freepascal.org
Sat Nov 22 17:38:33 CET 2014



On Sat, 22 Nov 2014, Mattias Gaertner wrote:

> On Sat, 22 Nov 2014 16:18:09 +0100
> Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:
>
>> Am 2014-11-22 um 15:06 schrieb Mattias Gaertner:
>>  > procedure TForm1.FormCreate(Sender: TObject);
>>  > var s: string; // String = AnsiString because of $H+
>>  > begin
>>  >   s:=GetCommandLineW;
>>  >   // GetCommandLineW returns a UTF-16 PWideChar
>>  >   // the compiler adds code to convert this to the
>>  >   // default system codepage (CP_ACP = CP_UTF8)
>>  >   // the resulting string has StringCodePage CP_ACP
>>  >   // and is encoded in UTF-8.
>>  >   // therefore you can simply use it with the LCL
>> 
>> Okay.
>> Does that mean that the compiler *always* assumes that
>> String=UTF-8 encoded AnsiString 
>
> Yes, with the UTF8 RTL. The default RTL uses system codepage.

Careful, there is no such thing as the "UTF8 RTL".

There is now a "Unicode and CodePage-aware RTL".

That means it has:
- Codepage aware single-byte strings.
   The codepage of a string may, or may not, be UTF8 (i.e. Unicode).
- Widestrings (unicode).
The compiler handles conversion of codepages transparantly.

The codepage aware single-byte strings are not automatically UTF-8.
On linux, this is probably so. But on windows, this is not necessarily so,

Additionally, most basic File I/O routines now correctly call the underlying 
OS-es file routines with the codepage the OS expects (which is WideString on Windows).

The exact behaviour of the RTL is controlled by a couple of variables:
DefaultSystemCodePage, DefaultFileSystemCodePage , DefaultRTLFileSystemCodePage.

See http://wiki.freepascal.org/FPC_Unicode_support.

Michael.


More information about the Lazarus mailing list