[Lazarus] Unicode on Windows

Tue Apr 10 01:45:38 CEST 2012

Mattias Gaertner schrieb:

>> Yes. For Unicode encoding we would need new functions to distinguish 
>> between number of bytes and number of (visible) glyphs:
>>
>> LengthInBytes()
>> LengthInGlyphs()

It should be mentioned that Unicode allows for different encodings of 
composed/decomposed characters. E.g. 'é' can be stored as 'é' (single 
composed codepoint) or as '´e' (two decomposed codepoints). Even if both 
encodings look the same on screen, Pos (or UTF8Pos) will only find the 
encoding as given in the search string, and it has to be specified what 
LengthInGlyphs really should return - the number of really visible 
glyphs, what in case of ligatures etc.?

Every user has to know which kind of "length" he really wants to get:
- number of bytes for storage in a fixed-size variable or streaming
- number of glyphs for length-restricted user input
- number of pixels for GUI layout (TextWidth)
...

DoDi