Is it possible to achieve a teraflop/s on a chip? From high performance algorithms to architectures | IEEE Conference Publication | IEEE Xplore