[Lazarus] UTF8 RTL for Windows

Mattias Gaertner nc-gaertnma at netcologne.de
Sun Nov 23 00:15:18 CET 2014


On Sat, 22 Nov 2014 17:38:33 +0100 (CET)
Michael Van Canneyt <michael at freepascal.org> wrote:

>[...]
> > Yes, with the UTF8 RTL. The default RTL uses system codepage.
> 
> Careful, there is no such thing as the "UTF8 RTL".
> 
> There is now a "Unicode and CodePage-aware RTL".

Well, yes, you are right of course.
But "Unicode and CodePage-aware RTL set to UTF-8" is an awkwardly long
title.
Also many users think that the new string types will break all
their code and add lots of overhead. I want to advertise, that this is
not so. On the contrary, it is very compatible, you get cross
platform Unicode and the overhead is pretty small.
And last but not least: Programming Unicode has become
easier, because string encoding is now more consistent.

 
> That means it has:
> - Codepage aware single-byte strings.
>    The codepage of a string may, or may not, be UTF8 (i.e. Unicode).
> - Widestrings (unicode).
> The compiler handles conversion of codepages transparantly.
> 
> The codepage aware single-byte strings are not automatically UTF-8.
> On linux, this is probably so. But on windows, this is not necessarily so,

True. Although many programmers misunderstand what this means. It is not
as scary as it sounds.

 
> Additionally, most basic File I/O routines now correctly call the underlying 
> OS-es file routines with the codepage the OS expects (which is WideString on Windows).

Is it safe to say UTF-16? Or are there still UCS-2 Windows?

 
> The exact behaviour of the RTL is controlled by a couple of variables:
> DefaultSystemCodePage, DefaultFileSystemCodePage , DefaultRTLFileSystemCodePage.

Yes, that's the important bit that FPC made better than Delphi. :)

 
> See http://wiki.freepascal.org/FPC_Unicode_support.


Mattias




More information about the Lazarus mailing list