[Lazarus] Improving UTF8CharacterLength?

Mattias Gaertner nc-gaertnma at netcologne.de
Thu Aug 13 13:50:13 CEST 2015


On Thu, 13 Aug 2015 13:23:28 +0200
Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:

>[...]
>  >> if ((ord(p^) and %11110000) = %11100000) then
>  >>     begin  // could be 3 byte character
>  >>     if ((ord(p[1]) and %11000000) = %10000000) and
>  >>        ((ord(p[2]) and %11000000) = %10000000) then ...
>  >>     ...
>  >> ------------
>  >> In the above (current) code 3 bytes are accessed which may step behind the zero byte.
>  > The "and" operator stops evaluating if left side is already false.
> 
> Only if you have a valid UTF-8 string.

I can't follow you here. If the string is valid UTF-8 then p[1] and p[2]
are not zero.

> I thought we are talking about *invalid* UTF-8 strings where
> it can happen that p[2] is accessed although it is not part
> of the string.

I can't follow you here. If p[2] is not part of the string, then p[1]
must be #0.

Mattias




More information about the Lazarus mailing list