[Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]

John Landmesser jmlandmesser at gmx.de
Thu Dec 30 14:17:40 CET 2021


Perhaps usefui test information from my PC:

******************************************
[john1 at manjaro sdb2]$ ./utf8lentest
234526968
fst:128406168
pop:128406168
add:128406168
asm:128406168

29315871

fst 1365
fst 1367
fst 1366
fst 1366
pop 9990
pop 9990


pop 9997
pop 9981
add 1386
add 1382
add 1386
add 1390
asm 346
asm 346
asm 346
asm 349
fst 1357
fst 1368
fst 1372
fst 1371
pop 10681
pop 6886
pop 6895
pop 6916
add 1247
add 1248
add 1250
add 1248
asm 295
asm 291
asm 291
asm 293
[john1 at manjaro sdb2]$
[john1 at manjaro sdb2]$ inxi -F
System:
   Host: manjaro Kernel: 5.10.84-1-MANJARO x86_64 bits: 64
     Desktop: Xfce 4.16.0 Distro: Manjaro Linux
Machine:
   Type: Laptop System: LENOVO product: 81RS v: Lenovo Yoga S740-14IIL
     serial: <superuser required>
   Mobo: LENOVO model: LNVNB161216 v: SDK0J40709 WIN
     serial: <superuser required> UEFI: LENOVO v: BYCN39WW date: 05/28/2021
Battery:
   ID-1: BAT0 charge: 62.4 Wh (95.6%) condition: 65.3/62.0 Wh (105.3%)
CPU:
   Info: quad core model: Intel Core i7-1065G7 bits: 64 type: MT MCP cache:
     L2: 2 MiB
   Speed (MHz): avg: 3520 min/max: 400/3900 cores: 1: 3543 2: 3890 3: 2319
     4: 3513 5: 3709 6: 3650 7: 3792 8: 3749
Graphics:
   Device-1: Intel Iris Plus Graphics G7 driver: i915 v: kernel
   Device-2: NVIDIA GP108M [GeForce MX250] driver: nvidia v: 495.44
   Device-3: Chicony Integrated Camera type: USB driver: uvcvideo
   Display: x11 server: X.Org 1.21.1.2 driver: loaded: modesetting,nvidia
     unloaded: nouveau resolution: 1: 1920x1080~60Hz 2: 1920x1080~60Hz
   Message: Unable to show advanced data. Required tool glxinfo missing.
Audio:
   Device-1: Intel Ice Lake-LP Smart Sound Audio driver: sof-audio-pci
   Sound Server-1: ALSA v: k5.10.84-1-MANJARO running: yes
   Sound Server-2: PipeWire v: 0.3.40 running: yes
Network:
   Device-1: Intel Ice Lake-LP PCH CNVi WiFi driver: iwlwifi
   IF: wlp0s20f3 state: up mac: 04:33:c2:02:de:51
   Device-2: Realtek RTL8153 Gigabit Ethernet Adapter type: USB
     driver: r8152
   IF: enp0s13f0u1u4 state: up speed: 1000 Mbps duplex: full
     mac: 4c:e1:73:42:1f:6b
   IF-ID-1: pan1 state: down mac: 7a:5c:6a:f4:06:56
Bluetooth:
   Device-1: Intel AX201 Bluetooth type: USB driver: btusb
   Report: rfkill ID: hci0 state: up address: see --recommends
Drives:
   Local Storage: total: 1.86 TiB used: 317.16 GiB (16.7%)
   ID-1: /dev/nvme0n1 vendor: Micron model: MTFDHBA1T0TCK size: 953.87 GiB
   ID-2: /dev/sda type: USB vendor: Western Digital model: WD10EARX-00N0YB0
     size: 931.51 GiB
   ID-3: /dev/sdb type: USB vendor: Kingston model: DataTraveler 2.0
     size: 14.54 GiB
Partition:
   ID-1: / size: 57.9 GiB used: 35.88 GiB (62.0%) fs: ext4 dev:
/dev/nvme0n1p8
   ID-2: /boot/efi size: 259.5 MiB used: 114.1 MiB (44.0%) fs: vfat
     dev: /dev/nvme0n1p1
Swap:
   ID-1: swap-1 type: partition size: 16.67 GiB used: 0 KiB (0.0%)
     dev: /dev/nvme0n1p9
Sensors:
   System Temperatures: cpu: 58.0 C mobo: N/A
   Fan Speeds (RPM): N/A
Info:
   Processes: 289 Uptime: 9m Memory: 15.2 GiB used: 2.19 GiB (14.4%)
   Shell: Bash inxi: 3.3.11



*****************************************






Am 30.12.21 um 13:58 schrieb Marco van de Voort via lazarus:
>
> On 30-12-2021 10:15, Florian Klämpfl via lazarus wrote:
>>
>> Linux uses different calling conventions, please check with the patch
>> below.
>>
> Linux is quite generous with the volatile registers, so luckily it
> matches quite closely.
>
> I first tried the approach of your patch, but [s] has problems on
> windows, so would require ifdef on every "s"use, so I simply move [s]
> to rcx
>
>   {$ifndef Windows}
>   // we can't use [s] as an alias for the pointer parameter, because
> the non assembler procedure on Windows
>  // changes that into a stack reference. FPC doesn't support non
> volatile frame management for assembler procs like Delphi does.
>   mov rcx,s         // rdi
>   mov edx,len       // rsi
>   {$endif}
>
> and the ifdeffing of the assembler procedure on linux vs inline asm
> block on Windows. Then it works on Linux x86_64.
>
> Funnily, our server AMD Athlon 200GE (Zen1, 3.2GHz?) nearly the exact
> same timings as my i7-3770 3.4GHz
>
> I did some other minor work after last post, so here is now the entire
> program:
>



More information about the lazarus mailing list