[Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

Florian Klämpfl florian at freepascal.org
Thu Dec 30 10:15:25 CET 2021


Am 30.12.21 um 08:23 schrieb Alexey Tor. via lazarus:
> 
>> New unit test, with Martin's integrated. If I play with godbolt, Ryzen 
>> zen3 (ryzen 5x00X) is nearly twice as fast in cycles as my Ivy Bridge, 
>> so I would like to see some benchmarks from various processors. Also 
>> from very old ones (P4 and Clawhammers) to test instruction sets. 
> Project utf8lentest raised exception class 'External: SIGSEGV'.
> 
>   In file 'utf8lentest.lpr' at line 89:
> 
> movdqu xmm0, [rcx]
> 
> OS: Linux x64. CPU:

Linux uses different calling conventions, please check with the patch below.

> 
>     vendor_id = "GenuineIntel"
>        (simple synth)  = Intel Core (unknown type) (Sandy Bridge 
> D2/J1/Q0) {Sandy Bridge}, 32nm
> 


15c15
< {define asmdebug}
---
 > { $define asmdebug}
46c46
< function asmutf8length(const s : pchar;len:integer):int64;
---
 > function asmutf8length(const s : 
pchar;len:int64):int64;assembler;nostackframe;
49d48
< begin
52c51
<     mov r8,rdx
---
 >     mov r8,len
89c88
<   movdqu xmm0, [rcx]
---
 >   movdqu xmm0, oword ptr [s]
95c94
<   add rcx,16
---
 >   add s,16
128c127
<   movzx r8d, byte [rcx]        // unaligned bytes after sse loop
---
 >   movzx r8d, byte [s]          // unaligned bytes after sse loop
135c134
<   inc rcx
---
 >   inc s
140c139
< end['xmm5','xmm6']; // volatile registers used.
---
 >   ret



More information about the lazarus mailing list