[Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

Martin Frb lazarus at mfriebe.de
Thu Dec 30 15:09:04 CET 2021


On 30/12/2021 14:43, Marco van de Voort via lazarus wrote:
> Compile with -O4 -Cpcoreavx2 , the others (non asm) will become 
> faster, my guess is  "add" will be about double of asm.

Core I7 8700K

3.3.1 from Dec 10th
3.2.3 from Dec 9th

With fpc 3.3.1:
- fst is worse?
- add gets better

-O4 -Cpcoreavx2

fpc 3.2.3 /   fpc 3.3.1

fst 594       fst 688
fst 578       fst 703
fst 578       fst 687
fst 562       fst 688

pop 485       pop 485
pop 500       pop 500
pop 500       pop 484
pop 484       pop 500

add 594       add 422
add 578       add 438
add 578       add 437
add 594       add 453

asm 250       asm 250
asm 250       asm 250
asm 250       asm 250
asm 250       asm 266



fpc 3.2.3
-O4 -Cpcoreavx           -O4 -CpCOREI

fst 594                  fst 593
fst 578                  fst 579
fst 578                  fst 562
fst 594                  fst 578

pop 500                  pop 500
pop 515                  pop 500
pop 500                  pop 500
pop 485                  pop 485

add 593                  add 593
add 579                  add 578
add 578                  add 594
add 593                  add 594

asm 250                  asm 250
asm 250                  asm 250
asm 235                  asm 250
asm 250                  asm 250




More information about the lazarus mailing list