Currently, it uses a generic build script.
This script assumes:
const REGISTER_SIZE = 16
const REGISTER_COUNT = 16
const CACHELINE_SIZE = 64
const SIMD_NATIVE_INTEGERS = true
If any of these are violated, dependent libraries (e.g., LoopVectorization) are likely to produce suboptimal code. If these numbers undershoot, that would just mean some performance is left on the table, but it's likely to perform reasonably well.
If these numbers overshoot, performance consequences could be dire. Register spills galore.
I believe some ARM CPUs do not have SIMD Float64, so perhaps this should be handled somehow.
Ideally, we'd use a library like CpuId.jl to query hardware info, like we do for AMD and Intel.
Currently, it uses a generic build script.
This script assumes:
If any of these are violated, dependent libraries (e.g., LoopVectorization) are likely to produce suboptimal code. If these numbers undershoot, that would just mean some performance is left on the table, but it's likely to perform reasonably well.
If these numbers overshoot, performance consequences could be dire. Register spills galore.
I believe some ARM CPUs do not have SIMD
Float64, so perhaps this should be handled somehow.Ideally, we'd use a library like CpuId.jl to query hardware info, like we do for AMD and Intel.