[Lazarus] substr return wrong string with some utf8 char
Jürgen Hestermann
juergen.hestermann at gmx.de
Fri Feb 11 18:44:12 CET 2011
The current thicket of different character encodings should be reduced
to the unavoidable minimum. I think the decision for UTF8 is very good.
It is unicode based, has the least impact on memory and all other
encondings have multiple bytes for characters too (with the drawback of
increased memory consumption). The only reason to convert to other
encodings (like ANSI) should be to satisfy non-UTF8 inferface
requirements (like OS API).
And I believe that we need *two* string functions for index
calculations: NumberOfBytes to determine the number of bytes (as Length
does now) and NumberOfCharacters that reports the number of characters.
Then everybody has to realize that in general NumberOfBytes is faster to
calculate so it should be prefered (if the calculation of the number of
characters is not required).
So for me the ultimate string type is UTF8 stored like ansistrings plus
new functions to retrieve NumberOfBytes and NumberOfCharacters (even if
Length already does the first task the new name would make its purpose
much clearer).
More information about the Lazarus
mailing list