CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters | IEEE Conference Publication | IEEE Xplore