On Hardware-Accelerated Maximally-Efficient Systolic Arrays