Sunder: a programmable hardware prefetch architecture for numerical loops | IEEE Conference Publication | IEEE Xplore