[Lazarus] Improving UTF8CharacterLength?
Jürgen Hestermann
juergen.hestermann at gmx.de
Thu Aug 13 13:23:28 CEST 2015
Am 2015-08-13 um 13:01 schrieb Mattias Gaertner:
> On Thu, 13 Aug 2015 12:38:00 +0200
> Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:
>> Am 2015-08-13 um 11:55 schrieb Mattias Gaertner:
>> > A string always ends with a #0, so checking byte by byte makes sure you
>> > stay within range.
>> Not quite true:
>> ------------
>> if ((ord(p^) and %11110000) = %11100000) then
>> begin // could be 3 byte character
>> if ((ord(p[1]) and %11000000) = %10000000) and
>> ((ord(p[2]) and %11000000) = %10000000) then ...
>> ...
>> ------------
>> In the above (current) code 3 bytes are accessed which may step behind the zero byte.
> The "and" operator stops evaluating if left side is already false.
Only if you have a valid UTF-8 string.
I thought we are talking about *invalid* UTF-8 strings where
it can happen that p[2] is accessed although it is not part
of the string.
More information about the Lazarus
mailing list