[Lazarus] Improving UTF8CharacterLength?

Thu Aug 13 13:23:28 CEST 2015

Am 2015-08-13 um 13:01 schrieb Mattias Gaertner:
 > On Thu, 13 Aug 2015 12:38:00 +0200
 > Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:
 >> Am 2015-08-13 um 11:55 schrieb Mattias Gaertner:
 >>  > A string always ends with a #0, so checking byte by byte makes sure you
 >>  > stay within range.
 >> Not quite true:
 >> ------------
 >> if ((ord(p^) and %11110000) = %11100000) then
 >>     begin  // could be 3 byte character
 >>     if ((ord(p[1]) and %11000000) = %10000000) and
 >>        ((ord(p[2]) and %11000000) = %10000000) then ...
 >>     ...
 >> ------------
 >> In the above (current) code 3 bytes are accessed which may step behind the zero byte.
 > The "and" operator stops evaluating if left side is already false.

Only if you have a valid UTF-8 string.
I thought we are talking about *invalid* UTF-8 strings where
it can happen that p[2] is accessed although it is not part
of the string.