[Lazarus] How to use strings properly with fixes_1_6 and FPC 3.0.0?

Martin Frb lazarus at mfriebe.de
Sat Oct 22 03:12:34 CEST 2016


On 21/10/2016 22:16, Juha Manninen via Lazarus wrote:
> UTF-16. It does not support all the complex rules of combining
> CodePoints, but it apparently works well for accented characters in
> western languages.
>

Which ones does it not support?
When I added it to SynEdit it was complete. It had all the combinings 
that the utf8 standard had back then. (at least that I could find in the 
documentation)

Of course if a new combining range is added, it will not contain it. If 
that is needed one needs an external (OS or otherwise) library, that 
can/will be updated on those occasions.

Mind "combining codepoints" have nothing to do with how many codepoints 
will be represented by one glyph.

"รข" is one character. But it can be a single codepoint (in utf16 one 
code-unit or word // in utf8 several code-unit or byte), or 2 codepoints 
("a" + combining "^").
"fi" are 2 chars. But the may be 2 or 1 glyph (ligature)

It is my understanding (but I do not know for sure) that in some 
languages (such as Arabic) certain letter combinations form a single 
glyph (afaik/google see https://en.wikipedia.org/wiki/Hamzah combined 
with a letter). Though maybe it is considered 2 glyph? I do not know 
Arabic at all.
Also in some scripts glyphs  are displayed in an order different from 
their occurrence in the text.
All of this however has nothing to do with combining codepoints, or what 
counts a character.



More information about the Lazarus mailing list