[Lazarus] How to use strings properly with fixes_1_6 and FPC 3.0.0?
Martin Frb
lazarus at mfriebe.de
Sat Oct 22 03:12:34 CEST 2016
On 21/10/2016 22:16, Juha Manninen via Lazarus wrote:
> UTF-16. It does not support all the complex rules of combining
> CodePoints, but it apparently works well for accented characters in
> western languages.
>
Which ones does it not support?
When I added it to SynEdit it was complete. It had all the combinings
that the utf8 standard had back then. (at least that I could find in the
documentation)
Of course if a new combining range is added, it will not contain it. If
that is needed one needs an external (OS or otherwise) library, that
can/will be updated on those occasions.
Mind "combining codepoints" have nothing to do with how many codepoints
will be represented by one glyph.
"รข" is one character. But it can be a single codepoint (in utf16 one
code-unit or word // in utf8 several code-unit or byte), or 2 codepoints
("a" + combining "^").
"fi" are 2 chars. But the may be 2 or 1 glyph (ligature)
It is my understanding (but I do not know for sure) that in some
languages (such as Arabic) certain letter combinations form a single
glyph (afaik/google see https://en.wikipedia.org/wiki/Hamzah combined
with a letter). Though maybe it is considered 2 glyph? I do not know
Arabic at all.
Also in some scripts glyphs are displayed in an order different from
their occurrence in the text.
All of this however has nothing to do with combining codepoints, or what
counts a character.
More information about the Lazarus
mailing list