[Lazarus] dynamic string proposal

Martin Frb lazarus at mfriebe.de
Wed Aug 16 16:36:15 CEST 2017


On 16/08/2017 13:37, Alexey via Lazarus wrote:
> On 16.08.2017 15:30, Martin Frb via Lazarus wrote:
>>
>> A char can be composed of several combining code points (each of them 
>> afaik, in the 32 bit range).
>> So a char can have 96 or more bits. (And not all of them have a 
>> combined form).
>
> See my prev post: i see that each S[i] good to be like QWord 
> (sizeof(one char)= sizeof(Qword)). It can be TextChar. And type can be 
> TextString. internally it can be compressed to utf8. TextString is 
> good if i want to parse text by "chars". If "char" needs more bytes- 
> lets take more (internally it is same utf8)
>

Have a look at 
https://www.reddit.com/r/Unicode/comments/4yie0a/tallest_longest_unicode_character/

There is ONE character, that comprises more than 200 codepoints.
Only way to store such a char is in a type of dynamic size (aka string)

Well I couldn't find an official doc what makes the boundaries of a char.

But as far as I can see: if รค is one character, and it can be encoded as 
"none combining codepoint" + "combining codepoint", then a character is 
any sequence of one "none combining codepoint" + zero or more "combining 
codepoints" (AFAIK Arabic scripts has chars, that have several 
"combining codepoints", so this is happening in actual languages.

The example as far as I checked fulfils this definition.



More information about the Lazarus mailing list