[Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]
Florian Klämpfl
florian at freepascal.org
Thu Dec 30 10:15:25 CET 2021
Am 30.12.21 um 08:23 schrieb Alexey Tor. via lazarus:
>
>> New unit test, with Martin's integrated. If I play with godbolt, Ryzen
>> zen3 (ryzen 5x00X) is nearly twice as fast in cycles as my Ivy Bridge,
>> so I would like to see some benchmarks from various processors. Also
>> from very old ones (P4 and Clawhammers) to test instruction sets.
> Project utf8lentest raised exception class 'External: SIGSEGV'.
>
> In file 'utf8lentest.lpr' at line 89:
>
> movdqu xmm0, [rcx]
>
> OS: Linux x64. CPU:
Linux uses different calling conventions, please check with the patch below.
>
> vendor_id = "GenuineIntel"
> (simple synth) = Intel Core (unknown type) (Sandy Bridge
> D2/J1/Q0) {Sandy Bridge}, 32nm
>
15c15
< {define asmdebug}
---
> { $define asmdebug}
46c46
< function asmutf8length(const s : pchar;len:integer):int64;
---
> function asmutf8length(const s :
pchar;len:int64):int64;assembler;nostackframe;
49d48
< begin
52c51
< mov r8,rdx
---
> mov r8,len
89c88
< movdqu xmm0, [rcx]
---
> movdqu xmm0, oword ptr [s]
95c94
< add rcx,16
---
> add s,16
128c127
< movzx r8d, byte [rcx] // unaligned bytes after sse loop
---
> movzx r8d, byte [s] // unaligned bytes after sse loop
135c134
< inc rcx
---
> inc s
140c139
< end['xmm5','xmm6']; // volatile registers used.
---
> ret
More information about the lazarus
mailing list