Abstract:
Large-scale parallel implementation of matrix multiply and accumulate (MAC) core poses significant energy and area constraints in analog voltage domain under reduced supp...Show MoreMetadata
Abstract:
Large-scale parallel implementation of matrix multiply and accumulate (MAC) core poses significant energy and area constraints in analog voltage domain under reduced supply voltage. A spatial multi-bit sub-1-V time-domain matrix multiplier interface is presented using multi-bit back-gate-driven delay elements as a scalable alternative for various approximate computing applications. A single-chip solution is demonstrated for two application modes: a high-throughput digitally driven mode for acceleration and a low-energy analog front-end mode for sensing. In accelerate mode, the system achieves an aggregate throughput of 21.6 GMAC/s with 9 TOPS/W energy efficiency. In sense mode, the system exhibits an energy efficiency of 55.3 TOPS/W for classification purpose. The proposed architecture utilizes 16-parallel 6-bit input vectors to perform matrix MAC computations using time-domain signal processing with 3-bit resistive weights at a sub-1-V supply of 0.7 V. An integrated speculative time-to-digital converter (is employed for 6-bit time-domain quantization with an on-chip mismatch calibration scheme. The prototype is fabricated in 65-nm CMOS technology and occupies an active area of 0.04 mm2. The system performs image recognition of handwritten digits using a machine learning scheme and demonstrates an average classification accuracy of 84.3% on the MNIST dataset. The resultant energy per MAC computation in the proposed spatial architecture is about 15× lower than a digital CMOS combinational logic-based parallel-tree MAC.
Published in: IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( Volume: 8, Issue: 3, September 2018)