Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply | IEEE Conference Publication | IEEE Xplore