[Lazarus] Faster than popcnt [[Re: UTF8LengthFast returning incorrect results on AARCH64 (MacOS)]]
Marco van de Voort
fpc at pascalprogramming.org
Wed Dec 29 17:19:44 CET 2021
On 29-12-2021 16:30, Martin Frb via lazarus wrote:
>
>
>>
>> Could you post full source if you haven't already? For a bit of
>> benchmarking. I just wrote it from the top of my head, and I assumed
>> 5 instructions for 16-byte would win any time, but haven't verified
>> anything yet.
> I had it attached on my last mail. Attached it again here. (3rd
> procedure / "Utf8LengthAdd")
>
> It is only 64bit for now. (And not cleaned up in any way).
>
> Also changing "bc >> 7" and "bc and 127"
> to "moddiv(bc, 255, full, remain)" might save a few more ms. But
> probably needs larger data to benchmark.
>
> If you do work on this, feel free to integrate my code as the baseline
> for cpu without SSE.
> Otherwise, it might be a bit until I get to it.
>
First results: (on an ageing i7-3770, trunk FPC -O4 -Cpcoreavx)
fst 781
fst 781
fst 797
fst 766
pop 656
pop 641
pop 640
pop 641
add 562
add 578
add 563
add 594
asm 297
asm 296
asm 297
asm 297
Asm is nearly fully functional and working, more importantly the
remaining issues are constant time and single instruction work,
shouldn't influence benchmarking for anything than the shortest sequences.
I'll finish up and post the whole shebang, since more eyes could help,
I'm an asm amateur in some regards.
More information about the lazarus
mailing list