- Using float instead of double for floating point matrix multiplications (in core/src/matmul.simd.hpp) reduces a lot the computation time