Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation | IEEE Conference Publication | IEEE Xplore