[Lazarus] UTF8LengthFast returning incorrect results on AARCH64 (MacOS)

Marco van de Voort fpc at pascalprogramming.org
Mon Dec 27 18:34:57 CET 2021


Op 12/27/2021 om 4:39 PM schreef Bart via lazarus:
> pn8^              =11100010   //first byte
> (pn8^ shr 7)      =11111111  //<<-- I would have expected that to be 00000001 ?

Depends on if pn8^ is signed or not, for a signed shift it makes sense. 
The definition as pint8 (instead of puint8) is an odd choice.

The expression seems to be 1 when the top bits are 10  iow when it is a 
follow bytes of utf8, that is what the comment says, and I as far as I 
can see the signedness doesn't matter.

Basically to me that seems to be a branchless version of

if (p[i] and %11000000)=%10000000 then

    inc(result);

...which counts all utf8 follow bytes, and then subtracts it from the 
number of bytes in a string to find the number of utf8 sequences/codepoints.


Maybe the absolute stuff confuses somehow? Also make sure the input is 
100% the same by printing the values of the bytes of the input string.



More information about the lazarus mailing list