arm - Inline aarch64 assembly UMOV source syntax -


below attempt @ implementing fast popcount aarch64 using neon:

#include <stdio.h>  int count_bits(unsigned long long val) {   unsigned long long p = 0;   int c = 0;   __asm__("dup  %0.2d, %2        \n\t"           "cnt  %0.8b, %0.8b     \n\t"           "addp d0, %0.2d        \n\t"           "umov %1, d0           \n\t"           : "+w"(p), "+r"(c)           : "r"(val) : "d0");   return c; }  int main(int argc, const char *argv[]) {   printf("test: %i\n", count_bits(-1ull));   return 0; } 

and error:

$ gcc test.c -o test error: operand 2 should simd vector element -- `umov x0,d0' 

i'm not sure addp instruction, specifier suggests adds 2 dwords, result of cnt instruction stored 8 bytes (%0.8b in addp doesn't work). shouldn't rather use uadalp summing components ?

error: operand 2 should simd vector element -- `umov x0,d0' 

("simd vector element" definition @ c1.2.4 in arm arm armv8-a.)

umov <wd>, <vn>.<ts>[<index>] 

or 64 bit

umov <xd>, <vn>.<ts>[<index>] 

which in case guess can be

umov %1, v0.d[0]  

however i'm not sure if code correct now. don't have env. test.


Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -