At the moment the __builtin_clz* compile down to bsrq on x86_64. Compiling with -mlzcnt wires up the actual instruction.
CountBits<unsigned long long> without -mlzcnt:
_Z9CountBitsmi:
.LFB189:
.cfi_startproc
endbr64
xorl %eax, %eax
testq %rdi, %rdi
je .L1
bsrq %rdi, %rdi
movl $64, %eax
xorq $63, %rdi
subl %edi, %eax
CountBits<unsigned long long> with -mlzcnt:
_Z9CountBitsmi:
.LFB189:
.cfi_startproc
endbr64
xorl %eax, %eax
testq %rdi, %rdi
je .L1
movl $64, %eax
lzcntq %rdi, %rdi
subl %edi, %eax
I'm unable to test the significance of that because my CPU does not support the instruction. But I assume @sipa would probably know right away whether it's worth bothering.
At the moment the
__builtin_clz*compile down tobsrqon x86_64. Compiling with-mlzcntwires up the actual instruction.CountBits<unsigned long long>without-mlzcnt:CountBits<unsigned long long>with-mlzcnt:I'm unable to test the significance of that because my CPU does not support the instruction. But I assume @sipa would probably know right away whether it's worth bothering.