Committed to daNeuralNet a first working version of a JIT for matrix-vector multiplication that relies on the FMA instruction set (Fused Multiply and Addition). This version generates code that is up to twice faster than the OpenBLAS for matrix sizes up to CPU cache size (100×100 to 200×200 usually), and maintains a marginal lead for…
The post <a href="https://www.delphitools.info/2020/09/28/matrix-vector-multiplication-jit-compiler/" target="_blank">Matrix-Vector multiplication JI
Weiterlesen...