[Lazarus] UTF8LengthFast returning incorrect results on AARCH64 (MacOS)

Noel Duffy noelduffy at xtra.co.nz
Mon Dec 27 00:43:56 CET 2021


I need some help getting to the root of a problem with incorrect results 
on Apple hardware (M1, aarch64) for the function UTF8LengthFast in lazutf8.

On MacOS, when given a string containing one or more UTF8 characters, 
UTF8LengthFast returns wildly incorrect results. On Fedora, the function 
returns the correct answer.

On Apple, I'm using fpc 3.3.1, and Lazarus is 2.2.0RC3. On Fedora, 
Lazarus is 2.0.12-2, and fpc is 3.2.2-1.

The following small program demonstrates the problem here.

% cat utf8len.pas

program utf8len;

{$mode objfpc}{$H+}
{$CODEPAGE UTF8}

uses SysUtils, lazutf8;

const
   s =  '€';
var
   n: PtrInt;
begin
   n := UTF8LengthFast(s);
   writeln('Len='+inttostr(n));
end.

% file utf8len.pas
utf8len.pas: Unicode text, UTF-8 text

To compile this, on MacOS I use this:

% ~/fpc3.3.1/bin/fpc -Sh -Cro -O3 -XX -vewbq -FU. 
-Fu/usr/local/share/lazarus/components/lazutils/lib/aarch64-darwin 
utf8len.pas

On Fedora, with this:

$ /usr/bin/fpc -Sh -Cro -O3 -XX -vewbq -FU. 
-Fu/usr/lib64/lazarus/components/lazutils utf8len.pas

Then run it:

On MacOS:

% ./utf8len
Len=-100663283

On Fedora:

$ ./utf8len
Len=1

On MacOS, I built fpc from source, compiling 3.3.1 with version 3.2.2. I 
then compiled Lazarus using fpc 3.3.1.

Because I built fpc and then Lazarus, I'm considering the possibility I 
introduced an error or a bug somewhere, so I want to eliminate that 
possibility if possible by asking if anyone else can reproduce this problem?

I have traced through the code using a debugger on both platforms. The 
same path through the function UTF8LengthFast is followed, but the final 
loop involving boolean shifting of bytes produces different results. I 
don't understand well enough the algorithm that the function uses to 
easily see what's going on.





More information about the lazarus mailing list