[Lazarus] Improving UTF8CharacterLength?
Mattias Gaertner
nc-gaertnma at netcologne.de
Thu Aug 13 13:50:13 CEST 2015
On Thu, 13 Aug 2015 13:23:28 +0200
Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:
>[...]
> >> if ((ord(p^) and %11110000) = %11100000) then
> >> begin // could be 3 byte character
> >> if ((ord(p[1]) and %11000000) = %10000000) and
> >> ((ord(p[2]) and %11000000) = %10000000) then ...
> >> ...
> >> ------------
> >> In the above (current) code 3 bytes are accessed which may step behind the zero byte.
> > The "and" operator stops evaluating if left side is already false.
>
> Only if you have a valid UTF-8 string.
I can't follow you here. If the string is valid UTF-8 then p[1] and p[2]
are not zero.
> I thought we are talking about *invalid* UTF-8 strings where
> it can happen that p[2] is accessed although it is not part
> of the string.
I can't follow you here. If p[2] is not part of the string, then p[1]
must be #0.
Mattias
More information about the Lazarus
mailing list