arm - Inline aarch64 assembly UMOV source syntax -
below attempt @ implementing fast popcount aarch64 using neon:
#include <stdio.h> int count_bits(unsigned long long val) { unsigned long long p = 0; int c = 0; __asm__("dup %0.2d, %2 \n\t" "cnt %0.8b, %0.8b \n\t" "addp d0, %0.2d \n\t" "umov %1, d0 \n\t" : "+w"(p), "+r"(c) : "r"(val) : "d0"); return c; } int main(int argc, const char *argv[]) { printf("test: %i\n", count_bits(-1ull)); return 0; }
and error:
$ gcc test.c -o test error: operand 2 should simd vector element -- `umov x0,d0'
i'm not sure addp
instruction, specifier suggests adds 2 dwords, result of cnt
instruction stored 8 bytes (%0.8b
in addp
doesn't work). shouldn't rather use uadalp
summing components ?
error: operand 2 should simd vector element -- `umov x0,d0'
("simd vector element" definition @ c1.2.4 in arm arm armv8-a.)
umov <wd>, <vn>.<ts>[<index>]
or 64 bit
umov <xd>, <vn>.<ts>[<index>]
which in case guess can be
umov %1, v0.d[0]
however i'm not sure if code correct now. don't have env. test.
Comments
Post a Comment