[Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

Mattias Gaertner nc-gaertnma at netcologne.de
Thu May 4 11:42:07 CEST 2017


On Thu, 4 May 2017 09:56:18 +0100
Tony Whyman via Lazarus <lazarus at lists.lazarus-ide.org> wrote:

>[...]
> I don't believe that string indexing even works for UTF8 strings at 
> present - at least not in a simple s[i] way.

It exists the same as for UTF-16 strings.

 
> Is it really that much overhead to have a simple codepage check before 
> calling the correct function to index a string? The obvious optimisation 
> would be to check for UTF8, then UTF16 then the Default codepage and 
> then the rest. Or perhaps UTF16 first for Windows. With register level 
> code you are talking about very few actual machine level operations.

The char type does not fit widechar. You would need widechar.

And in most cases the [] are used in loops. The compiler would have
to add checks on each access. It would be faster to convert the string
at the beginning to UnicodeString and back at the end.
A technique that many RTL functions do to support any string type.

 
> To me, a unified string type would have the advantage that:
> 
> - You would only have one managed string type "string" (and hence avoids 
> the confusion that exists today).

You can avoid the confusion by using only one string encoding,
either UTF-8 or UTF-16. The problem is that existing libraries often
support only one.

 
>[...]> - The only time that a programmer has to think about the character 
> encoding is when writing code that interacts directly with an external 
> interface.

That's already possible. With LazUTF8.
The problem is legacy code and sharing code with Delphi.

 
>[...]

Mattias


More information about the Lazarus mailing list