[Lazarus] substr return wrong string with some utf8 char

Jürgen Hestermann juergen.hestermann at gmx.de
Fri Feb 11 18:44:12 CET 2011


The current thicket of different character encodings should be reduced 
to the unavoidable minimum. I think the decision for UTF8 is very good. 
It is unicode based, has the least impact on memory and all other 
encondings have multiple bytes for characters too (with the drawback of 
increased memory consumption). The only reason to convert to other 
encodings (like ANSI) should be to satisfy non-UTF8 inferface 
requirements (like OS API).

And I believe that we need *two* string functions for index 
calculations: NumberOfBytes to determine the number of bytes (as Length 
does now) and NumberOfCharacters that reports the number of characters. 
Then everybody has to realize that in general NumberOfBytes is faster to 
calculate so it should be prefered (if the calculation of the number of 
characters is not required).

So for me the ultimate string type is UTF8 stored like ansistrings plus 
new functions to retrieve NumberOfBytes and NumberOfCharacters (even if 
Length already does the first task the new name would make its purpose 
much clearer).




More information about the Lazarus mailing list