[Lazarus] UTF8LengthFast returning incorrect results on AARCH64 (MacOS)
Noel Duffy
noelduffy at xtra.co.nz
Mon Dec 27 00:43:56 CET 2021
I need some help getting to the root of a problem with incorrect results
on Apple hardware (M1, aarch64) for the function UTF8LengthFast in lazutf8.
On MacOS, when given a string containing one or more UTF8 characters,
UTF8LengthFast returns wildly incorrect results. On Fedora, the function
returns the correct answer.
On Apple, I'm using fpc 3.3.1, and Lazarus is 2.2.0RC3. On Fedora,
Lazarus is 2.0.12-2, and fpc is 3.2.2-1.
The following small program demonstrates the problem here.
% cat utf8len.pas
program utf8len;
{$mode objfpc}{$H+}
{$CODEPAGE UTF8}
uses SysUtils, lazutf8;
const
s = 'â¬';
var
n: PtrInt;
begin
n := UTF8LengthFast(s);
writeln('Len='+inttostr(n));
end.
% file utf8len.pas
utf8len.pas: Unicode text, UTF-8 text
To compile this, on MacOS I use this:
% ~/fpc3.3.1/bin/fpc -Sh -Cro -O3 -XX -vewbq -FU.
-Fu/usr/local/share/lazarus/components/lazutils/lib/aarch64-darwin
utf8len.pas
On Fedora, with this:
$ /usr/bin/fpc -Sh -Cro -O3 -XX -vewbq -FU.
-Fu/usr/lib64/lazarus/components/lazutils utf8len.pas
Then run it:
On MacOS:
% ./utf8len
Len=-100663283
On Fedora:
$ ./utf8len
Len=1
On MacOS, I built fpc from source, compiling 3.3.1 with version 3.2.2. I
then compiled Lazarus using fpc 3.3.1.
Because I built fpc and then Lazarus, I'm considering the possibility I
introduced an error or a bug somewhere, so I want to eliminate that
possibility if possible by asking if anyone else can reproduce this problem?
I have traced through the code using a debugger on both platforms. The
same path through the function UTF8LengthFast is followed, but the final
loop involving boolean shifting of bytes produces different results. I
don't understand well enough the algorithm that the function uses to
easily see what's going on.
More information about the lazarus
mailing list