Loading [a11y]/accessibility-menu.js
2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications | IEEE Conference Publication | IEEE Xplore

2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications


Abstract:

The growing computational demands of AI inference have led to widespread use of hardware accelerators for different platforms, spanning from edge to the datacenter/cloud....Show More

Abstract:

The growing computational demands of AI inference have led to widespread use of hardware accelerators for different platforms, spanning from edge to the datacenter/cloud. Certain AI application areas, such as in high-frequency trading (HFT) [1–2], have a hard inference latency deadline for successful execution. We present our new AI accelerator which achieves high inference capability with outstanding single-stream responsiveness for demanding service-layer objective (SLO)-based AI services and pipelined inference applications, including large language models (LLM). Owing to low thermal design power (TDP), the scale-out solution can support multi-stream applications, as well as total cost of ownership (TCO)-centric systems effectively.
Date of Conference: 18-22 February 2024
Date Added to IEEE Xplore: 13 March 2024
ISBN Information:

ISSN Information:

Conference Location: San Francisco, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.