[Lazarus] substr return wrong string with some utf8 char

Hans-Peter Diettrich DrDiettrich1 at aol.com
Fri Feb 11 15:06:50 CET 2011


José Mejuto schrieb:

> If no checks about utf8 integrity are performed they should not be
> that "lot slower", only a bit slower, at least utf8pos, utf8copy is
> for sure slower.

I see no need for integrity checks, when the procedures are called with 
reasonable arguments. Before e.g. Copy can be called, the required 
parameters have to be determined, and *this* is where the use of the 
appropriate functions will automatically return valid arguments.

> A different thing is that current implementation is a bit overengined
> which add some overhead.
> 
> Is it logical/safe that utf8 functions do not check utf8 integrity ?
> I'm talking about utf8pos, utf8copy, etc...

There exists no need for an utf8pos function, for use with an utf8copy, 
when Pos already returns the correct start index for Copy. Only the 
count parameter deserves different handling in utf8copy - where the 
determination of the byte count can be done once, e.g. in an 
(UTF8)ByteCount function. Then Copy can allocate immediately the 
requested number of bytes, then move the same number of bytes. The use 
of the ByteCount function is not required when the end index is already 
known, from e.g. another Pos call.

It also would help to ensure text integrity when indexed access to 
bytes/chars in (MBCS/UTF) strings simply would be dropped. Then either a 
different string type or different access methods have to be used, at 
the choice of the coder.

DoDi





More information about the Lazarus mailing list