[Lazarus] UTF8 RTL for Windows
Michael Van Canneyt
michael at freepascal.org
Sat Nov 22 17:38:33 CET 2014
On Sat, 22 Nov 2014, Mattias Gaertner wrote:
> On Sat, 22 Nov 2014 16:18:09 +0100
> Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:
>
>> Am 2014-11-22 um 15:06 schrieb Mattias Gaertner:
>> > procedure TForm1.FormCreate(Sender: TObject);
>> > var s: string; // String = AnsiString because of $H+
>> > begin
>> > s:=GetCommandLineW;
>> > // GetCommandLineW returns a UTF-16 PWideChar
>> > // the compiler adds code to convert this to the
>> > // default system codepage (CP_ACP = CP_UTF8)
>> > // the resulting string has StringCodePage CP_ACP
>> > // and is encoded in UTF-8.
>> > // therefore you can simply use it with the LCL
>>
>> Okay.
>> Does that mean that the compiler *always* assumes that
>> String=UTF-8 encoded AnsiString
>
> Yes, with the UTF8 RTL. The default RTL uses system codepage.
Careful, there is no such thing as the "UTF8 RTL".
There is now a "Unicode and CodePage-aware RTL".
That means it has:
- Codepage aware single-byte strings.
The codepage of a string may, or may not, be UTF8 (i.e. Unicode).
- Widestrings (unicode).
The compiler handles conversion of codepages transparantly.
The codepage aware single-byte strings are not automatically UTF-8.
On linux, this is probably so. But on windows, this is not necessarily so,
Additionally, most basic File I/O routines now correctly call the underlying
OS-es file routines with the codepage the OS expects (which is WideString on Windows).
The exact behaviour of the RTL is controlled by a couple of variables:
DefaultSystemCodePage, DefaultFileSystemCodePage , DefaultRTLFileSystemCodePage.
See http://wiki.freepascal.org/FPC_Unicode_support.
Michael.
More information about the Lazarus
mailing list