[Lazarus] How to use strings properly with fixes_1_6 and FPC 3.0.0?

Fri Oct 21 23:16:34 CEST 2016

On Fri, Oct 21, 2016 at 2:26 PM, Juha Manninen
<juha.manninen62 at gmail.com> wrote:
> No, neither FPC nor Lazarus have library code to deal with [combined CodePoints] yet.
> The goal is to have an enumerator for user perceived characters, just
> like LazUnicode unit has for encoding agnostic CodePoints.

Sorry, that was not accurate.
Unit LazUnicode already has TUnicodeCharacterEnumerator which is able
to iterate combined accented Unicode characters.
It calls either function UTF8IsCombining or UTF16IsCombining depending
on the default encoding in use. Yes, Delphi and UTF-16 are supported.
The code was basically copied from SynEdit and then ported also to
UTF-16. It does not support all the complex rules of combining
CodePoints, but it apparently works well for accented characters in
western languages.

This:
 operator Enumerator(A: String): TUnicodeCharacterEnumerator;
would enable it for the for-in loop, but it is commented out now. The
current for-in loop enumerator works with CodePoints.

There is a test project in components/lazutils/test/LazUnicodeTest.lpi.
It includes combining CodePoints, too. Please take a look if you are interested.

Juha