[Lazarus] substr return wrong string with some utf8 char

Michael Schnell mschnell at lumino.de
Fri Feb 11 16:01:32 CET 2011


On 02/11/2011 02:26 PM, Hans-Peter Diettrich wrote:
>
> How would you determine the byte count for reading and writing text?
e.g. when using a stream. good question. As, AFAIK, this is no more than 
a yet incomplete project in the svn, I don't know.
>
>> So "Length" with this type can be defined as "character count" and 
>> copy can work on character length and position, and automatically 
>> convert strings if they are coded differently.
>
> I don't like automatic string conversion, because:
>> Of course certain operations might be really slow if the encoding of 
>> the data is not appropriate.
>
> Consider what will happen when every procedure or component has its 
> *own* idea of the "appropriate" encoding...

As always, comfort can be traded against speed. If the user wants speed 
he needs to take care that as few conversions as possible are done.

If he just uses this string type and does not explicitly enforce 
encoding no encoding is necessary but on exit and entry of his code. And 
the same code will work without re-coding for all codes used and entry 
and exit, provided they all are identical.

E.g. the Windows System API will use UTF-16, while the Linux System API 
uses UTF-8 for things like "caption" and "Text". The (even binary) 
unmodified user code will not need to do conversions for this kind of 
GUI work. (AFAIK, string constants are re-encoded on the first use, if 
necessary).

-Michael




More information about the Lazarus mailing list