Energy-efficient Inference Service of Transformer-based Deep Learning Models on GPUs

Energy-efficient Inference Service of Transformer-based Deep Learning Models on GPUs | IEEE Conference Publication | IEEE Xplore