On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance - Matrix-Multiply Revisited | IEEE Conference Publication | IEEE Xplore