Abstract
This paper proposes a network-on-chip (NoC) design customized for message reduction, which enhances some common routers with a special Reduce Processing Unit (RPU) to complete reduce-computations hop-by-hop, as well as to learn the transmission path of reduction-messages adaptively. More specifically, for reduction on a small data-set, the corresponding data is transmitted through the NoC directly. Thus, along the transmission path, enhanced routers can complete reduction in place, which not only speeds up the processing procedure but also coalesces messages. An adaptive method for the deterministic routing algorithm is also introduced to enable these routers to learn transmission paths accurately to improve the processing efficiency. We present the detailed micro-architecture design and evaluate the corresponding performance, the power consumption and chip-area. Testing results show that this design can improve the reduction / all_reduce performance of 2.67~11.76 times, while the consumption of power and chip-area are both limited.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Timothy, M.: The Future of Many Core Computing, http://i2pc.cs.illinois.edu/presentations/2010_05_06_Mattson_Slides.pdf
Rakesh, K., Timothy, G.M., Gilles, P., Rob, V.D.W.: The Case for Message Passing on Many-Core Chips. Multiprocessor System-on-Chip, pp. 115–123 (2011)
Jie, M., Daniel, R., Ayse, K.C.: 3D Systems with On-Chip DRAM for Enabling Low-Power High-Performance Computing. In: Proceedings of Fifteenth HPEC Workshop, Massachusetts, USA (September 2011)
Timothy, G.M., Rob, F.V.D.W., Michael, R., Thomas, L., Paul, B., Werner, H., Patrick, K., Jason, H., Sriram, V., Nitin, B., Greg, R., Saurabh, D.: The 48-core SCC processor: the programmer’s view. In: Proceedings of 2010 International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA (2010)
MULTICORE COMMUNICATIONS API WORKING GROUP, http://www.multicore-association.org/workgroup/mcapi.php
Dong, Y., Chen, J., Yang, X., Yang, C., Peng, L.: Low power optimization for MPI collective operations. In: The 9th International Conference for Young Computer Scientists, ICYCS 2008, IEEE (2008)
Rabenseifner, R.: Automatic MPI counter profiling of all users: First results on a CRAY t3e 900-512. In: Message Passing Interface Developer’s and User’s Conference (1999)
Rabenseifner, R.: Optimization of collective reduction operations. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3036, pp. 1–9. Springer, Heidelberg (2004)
Open MPI Development Team, Open MPI: open source high-performance computing, http://www.open-mpi.org/
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. High Performance Computing Applications 19(1), 49–66 (2005)
Rabenseifner, R.: Optimization of collective reduction operations. In: Proceedings of Int’l Conference on Computational Science (ICCS), Krakow, Poland (2004)
Nicolas, F., Marc, H., Eric, L., Bernard, T.: MPI for the Clint Gb/s Interconnect. In: Proceedings of the 10th European PVM/MPI User’s Group Meeting, pp. 395–403 (2003)
Maximize Platform MPI Performance with Voltaire® Fabric Collective AcceleratorTM (FCATM) and HP, http://www.mellanox.com/related-docs/voltaire_acceleration_software/FCA-Voltaire-Platform-HP-WEB111110.pdf
Underwood, K.D., Ligon, W.B., Sass, R.R.: Analysis of a prototype intelligent network interface. Concurrency and Computation: Practice and Experience 15(7-8), 751–777 (2003)
Almási, G.S., et al.: Implementing MPI on the blueGene/L supercomputer. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 833–845. Springer, Heidelberg (2004)
Gao, S., Schmidt, A.G., Sass, R.: Impact of reconfigurable hardware on accelerating mpi_reduce. In: 2010 International Conference on Field-Programmable Technology (FPT), pp. 29–36 (2010)
Libo, H., Zhiying, W., Nong, X.: Accelerating NoC-based MPI Primitives via Communication Architecture Customization. In: Proceedings of IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors, Delft, July 2012, pp. 141–148. IEEE (2012)
David, W., Patrick, G., Henry, H., Liewei, B., Bruce, E., Carl, R., Matthew, M., Chyi-Chang, M., John, F.B., John III, F.B., Anant, A.: On-chip Interconnection Architecture of the Tile Processor. IEEE Computer Society (September-October 2007)
Velamati, M.K., Kumar, A., Jayam, N., Senthilkumar, G., Baruah, P.K., Sharma, R., Kapoor, S., Srinivasan, A.: Optimization of collective communication in intra-cell MPI. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 488–499. Springer, Heidelberg (2007)
Ali, Q., Midkiff, S.P., Pai, V.S.: Efficient high performance collective communication for the cell blade. In: Proceedings of the 23rd International Conference on Supercomputing, pp. 193–203. ACM (2009)
Kohler, A., Radetzki, M., Gschwandtner, P., Fahringer, T.: Low-latency collectives for the intel scc. In: 2012 IEEE International Conference on Cluster Computing (CLUSTER), pp. 346–354. IEEE (2012)
Peng, Y., Saldaña, M., Chow, P.: Hardware support for broadcast and reduce in mpsoc. In: 2011 International Conference on Field Programmable Logic and Applications (FPL), pp. 144–150. IEEE (2011)
Gonzalez, R.E.: Xtensa: A configurable and extensible processor. IEEE Micro 20(2), 60–70 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, H., Lu, S., Zhang, Y., Yang, G., Zheng, W. (2014). Customized Network-on-Chip for Message Reduction. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8630. Springer, Cham. https://doi.org/10.1007/978-3-319-11197-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-11197-1_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11196-4
Online ISBN: 978-3-319-11197-1
eBook Packages: Computer ScienceComputer Science (R0)