The aim of solving this problem is to learn how to use the XMM registers for multiplication of floating point numbers. Matrix multiplication is a slow calculation especially if the floating point unit is used, and hence doing packed floating point calculations (if double precision is not required) might just be much faster. So this program will test that.

