[Lazarus] UTF8LengthFast returning incorrect results on AARCH64 (MacOS)

Bart bartjunk64 at gmail.com
Mon Dec 27 16:39:05 CET 2021


On Mon, Dec 27, 2021 at 3:41 PM Juha Manninen via lazarus
<lazarus at lists.lazarus-ide.org> wrote:

> It must be a Big endian / Little endian issue. IIRC it can be adjusted in ARM CPUs.
> Why do MacOS and Linux use a different setting there? I have no idea.

On second thought: if the function returns grabage for just a single
'€', the code for that should not enter the pasrt where it handles
blocks of size PtrInt and does masking with EIGHTYMASK etc. (The part
of the code that might be endianness dependant).
It should go to one of the 2 loops that simply does:  Result += (pn8^
shr 7) and ((not pn8^) shr 6);
That part should not depend on endianness at all.

On Win32 a sigle '€' will result in something like this:

pn8^              =11100010   //first byte
(pn8^ shr 7)      =11111111  //<<-- I would have expected that to be 00000001 ?
(not pn8^)        =00011101
(not pn8^) shr 6  =00000000
Add: (pn8^ shr 7) and ((not pn8^) shr 6)=0

pn8^              =10000010   //second byte
(pn8^ shr 7)      =11111111
(not pn8^)        =01111101
(not pn8^) shr 6  =00000001
Add: (pn8^ shr 7) and ((not pn8^) shr 6)=1

pn8^              =10101100   //third and last byte of '€'
(pn8^ shr 7)      =11111111
(not pn8^)        =01010011
(not pn8^) shr 6  =00000001
Add: (pn8^ shr 7) and ((not pn8^) shr 6)=1

B.t.w.
I find the code in Utf8LengthFast difficult to read.
Personally I dislike the C-ism of += and >> (even more so if both >>
and shr is used).

-- 
Bart


More information about the lazarus mailing list