ABSTRACT
Machine learning accelerators present a unique set of design challenges across chip architecture, instruction set, server design, compiler, and both inter- and intra-chip connectivity. With AWS Trainium, we've utilized AWS's end-to-end ownership from chip to server, network, compilers, and runtime tools to collaboratively design and optimize across all layers, emphasizing simplicity and ease of use. This talk will illustrate the design principles, tradeoffs, and lessons learned during the development of three generations of AWS ML products, from conceptualization to placing systems in the hands of AWS customers.
Comments