[Lazarus] String vs WideString

Tony Whyman tony.whyman at mccallumwhyman.com
Tue Aug 15 11:15:36 CEST 2017


On 14/08/17 22:01, Juha Manninen via Lazarus wrote:
> Tony Whyman, this issue has been discussed again and again for the
> past 10+ years first in FPC mailing lists and then in Lazarus lists.
> The current Unicode support in Lazarus works f***ing well and is
> amazingly compatible with Delphi.
> WinAPI parameters may require an explicit temporary UnicodeString
> variable but even then the code is compatible with Delphi.
>
> Tony Whyman, Marcos Douglas and Michael Schnell, please study the facts.
> For starters, this is about the current Unicode support in Lazarus:
>    http://wiki.freepascal.org/Unicode_Support_in_Lazarus
> I think the dynamic encoding and automatic conversion now work perfectly well.
> If you have a piece of code where it does not work, please ask for
> detailed info.
If a topic keeps on being discussed after 10+ years of argument, the 
reason is usually either (a) the problem and its solution have not been 
documented properly, or (b) the outcome is an unsatisfactory compromise.

In this case, I would argue that both are true.

I went back and read the wiki article you mentioned and was no more the 
wiser as to why the current mess exists. Is it really no more than 
because Delphi continues to screw up in this area, so must FPC? The body 
of the article appears to be a set of notes - not necessarily wrong in 
themselves but lacking the background and context needed to explain why 
it is like it is.

This problem will keep coming up until it is fixed properly and, by 
that, I mean the that solution is consistent, understandable intuitively 
and well documented. Windows eccentricity also need to kept to Windows.

Here is my wish list:

1. Stop using the term "Unicode".

    It is too ambiguous. It is used as both an all embracing term for
    multi-byte encoding and as a synonym for UTF16 and that is really
    too confusing. The problem is made worse by having UnicodeString as
    a two byte wide string type in both FPC and Delphi.


2. Clean up the char type.

    When Wirth created the "char" type in Pascal it was a simple ASCII
    or EBCDIC character. There are now seven different char types
    (including type equivalence) with no guidelines on when each is
    applicable. This is too many. Why shouldn't there be a single char
    type that intuitively represents a single character regardless of
    how many bytes are used to represent it. Yes, in a world where we
    have to live with UTF8, UTF16, UTF32, legacy code pages and Chinese
    variations on UTF8, that means that dynamic attributes have to be
    included in the type. But isn't that the only way to have consistent
    and intuitive character handling?


3. The problem with string handling today is that it is not based on a 
consistent approach to the character type.

    If you clean up character handling then the model for string
    handling should become obvious. A string is after all no more than a
    container for a character array and which should be constrained to
    have the same character encoding. A string should intuitively
    represent a string of text regardless of how many bytes are used to
    represent each character and with dynamic attributes to tell you how
    it is encoded.


4. FPC should clean up Delphi's mess for it. If a unified string type 
follows a consistent model then it should be possible to make all Delphi 
string types synonyms.

    You will need to allow exceptions for legacy programs that insist on
    manipulating the bytes themselves - but that is not rocket science.
    There is also the issue of the Windows API and its insistence on
    Wide Strings - but isn't that why calling conventions such as cdecl
    and stdcall exist - to tell the compiler when it needs to reformat
    the call for a given API convention.

Tony Whyman



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20170815/93b53d87/attachment-0001.html>


More information about the Lazarus mailing list