[Lazarus] UTF8LengthFast returning incorrect results on AARCH64 (MacOS)
Marco van de Voort
fpc at pascalprogramming.org
Mon Dec 27 18:34:57 CET 2021
Op 12/27/2021 om 4:39 PM schreef Bart via lazarus:
> pn8^ =11100010 //first byte
> (pn8^ shr 7) =11111111 //<<-- I would have expected that to be 00000001 ?
Depends on if pn8^ is signed or not, for a signed shift it makes sense.
The definition as pint8 (instead of puint8) is an odd choice.
The expression seems to be 1 when the top bits are 10 iow when it is a
follow bytes of utf8, that is what the comment says, and I as far as I
can see the signedness doesn't matter.
Basically to me that seems to be a branchless version of
if (p[i] and %11000000)=%10000000 then
inc(result);
...which counts all utf8 follow bytes, and then subtracts it from the
number of bytes in a string to find the number of utf8 sequences/codepoints.
Maybe the absolute stuff confuses somehow? Also make sure the input is
100% the same by printing the values of the bytes of the input string.
More information about the lazarus
mailing list