N latency 2N I/O-bandwidth 2D-array matrix multiplication algorithm | IEEE Conference Publication | IEEE Xplore