Since this is used heavily during matrix multiplication, gives a nice little speed boost.
Rust 1.27 stablized a variety of cpu intrinsics, including SIMD on x86/64 platforms. This commit moves to using those intrinsics for the optimized Float4 implementation. This means Psychopath now compiles on stable Rust with all optimizations. Yay!