[Lazarus] Beyond Compare 4 built with Lazarus 1.2

Marcos Douglas md at delfire.net
Sat Jan 4 12:32:25 CET 2014


On Sat, Jan 4, 2014 at 8:08 AM, Graeme Geldenhuys
<mailinglists at geldenhuys.co.uk> wrote:
> On 2014-01-04 04:34, Kostas Michalopoulos wrote:
>> Is there a way to ignore all these and make everything to work with
>> UTF-8? Like setting some global variable that makes all strings
>> (ansistrings) "UTF-8 codepage" or something
>
> We will have to wait for FPC 2.8.0 (or 3.0) which should have much
> better built-in Unicode support. String encoding conversion should then
> be taken care of automatically. Unfortunately it seems that the FPC RTL
> (there will be two of them) will be AnsiString or UTF-16 only. The RTL
> encoding is not configurable!
>
> So under all Unix-like systems (Linux, MacOSX, FreeBSD - basically every
> platform except Microsoft ones) there will be lots of string conversions
> from/to the OS or any libraries (which are normally UTF-8) to the FPC
> RTL which is going to be UTF-16. The constant conversion will also kick
> in when you do streaming to/from file or any TCP/IP communications -
> which both normally use UTF-8.
>
> I would have thought the Free Pascal team would improve their design
> over Delphi. eg: Seeing that automatic encoding conversion is seamless,
> I thought it shouldn't be hard to have native encodings on each
> platform, and the RTL can then be a dynamic Unicode implementation (it
> shouldn't care what encoding is used, as long as it is one of the
> Unicode encodings). By that I mean UTF-8 is used under Unix like
> systems, and UTF-16 under Windows. The UnicodeString type should have
> lived up to its name, and not be an alias for UTF16String. But alas,
> this is not going to happen.
>
> So we as developers have to use UTF-16 everywhere, or define our own
> dynamic types (which really should have been done at RTL level). For
> example:
>
>   {$IFDEF Unix}
>    RealUnicodeString = UTF8String;
>   {$ENDIF}
>   {$IFDEF Windows}
>    RealUnicodeString = UTF16String;
>   {$ENDIF}
>
> Then use the RealUnicodeString type in your applications and frameworks
> to minimise encoding conversions. But like I said, when you do this
> under Unix like systems, you are still going to get conversions when
> talking to the UTF-16 only RTL. Sad, but that is the way the Free Pascal
> team is going.
>
> Once that FPC release is made, then we will start seeing what
> performance impact it will have on all systems. Now is too early to tell.

+1

You always said this, ie, UnicodeString should be UTF-8 on Unix
plataform and UTF-16 on Windows, and I always agreed with you. This
make sense, this would be a true UnicodeString type. Delphi is the
only "trouble" to do this happens?

Should be so:

=== BEGIN ===
{$IFDEF Unix}
    UnicodeString = UTF8String;
{$ENDIF}
{$IFDEF Windows}
    UnicodeString = UTF16String;
{$ENDIF}

// the alias
string = UnicodeString;

// the automatic conversions
function UnicodeToUTF8(const S: UnicodeString): UTF8String;
begin
  {$IFDEF Unix}
      Result := S;
  {$ENDIF}
  {$IFDEF Windows}
      Result := UTF16ToUTF8(S);
  {$ENDIF}
end;

function UnicodeToUTF16(const S: UnicodeString): UTF16String;
begin
  {$IFDEF Unix}
      Result := UTF8ToUTF16(s);
  {$ENDIF}
  {$IFDEF Windows}
      Result := S;
  {$ENDIF}
end;
=== END ===

Maybe we are not seeing something, many details...

Regards,
Marcos Douglas




More information about the Lazarus mailing list