Abstract
High-speed networking in clusters usually relies on advanced hardware features in the NICs, such as zero-copy. Open-MX is a high-performance message passing stack designed for regular Ethernet hardware without such capabilities.
We present the addition of multiqueue support in the Open-MX receive stack so that all incoming packets for the same process are treated on the same core. We then introduce the idea of binding the target end process near its dedicated receive queue. This model leads to a more cache-efficient receive stack for Open-MX. It also proves that very simple and stateless hardware features may have a significant impact on message passing performance over Ethernet.
The implementation of this model in a firmware reveals that it may not be as efficient as some manually tuned micro-benchmarks. But our multiqueue receive stack generally performs better than the original single queue stack, especially on large communication patterns where multiple processes are involved and manual binding is difficult.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, D., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications 5(3), 63–73 (Fall 1991)
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. The International Journal of High Performance Computing Applications 14(3), 189–204 (2000)
Mellanox ConnectX - 4th Generation Server & Storage Adapter Architecture, http://mellanox.com/products/connectx_architecture.php
FCoE (Fibre Channel over Ethernet), http://www.fcoe.com
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, September 2004, pp. 97–104 (2004)
Goglin, B.: Design and Implementation of Open-MX: High-Performance Message Passing over generic Ethernet hardware. In: CAC 2008: Workshop on Communication Architecture for Clusters, held in conjunction with IPDPS 2008, Miami, FL, April 2008. IEEE Computer Society Press, Los Alamitos (2008)
Goglin, B.: Improving Message Passing over Ethernet with I/OAT Copy Offload in Open-MX. In: Proceedings of the IEEE International Conference on Cluster Computing, Tsukuba, Japan, September 2008, pp. 223–231. IEEE Computer Society Press, Los Alamitos (2008)
Grossman, L.: Large Receive Offload Implementation in Neterion 10GbE Ethernet Driver. In: Proceedings of the Linux Symposium (OLS 2005), Ottawa, Canada, July 2005, pp. 195–200 (2005)
Huggahalli, R., Iyer, R., Tetrick, S.: Direct Cache Access for High Bandwidth Network I/O. SIGARCH Computer Architecture News 33(2), 50–59 (2005)
Intel MPI Benchmarks, http://www.intel.com/cd/software/products/asmo-na/eng/cluster/mpi/219847.htm
Myricom Myri-10G, http://myri.com/Myri-10G/
Myricom, Inc. Myrinet Express (MX): A High Performance, Low-Level, Message-Passing Interface for Myrinet (2006), http://www.myri.com/scs/MX/doc/mx.pdf
The Parallel Virtual File System, version 2, http://www.pvfs.org
Mohammad, J.R., Afsahi, A.: 10-Gigabit iWARP Ethernet: Comparative Performance Analysis with Infiniband and Myrinet-10G. In: Proceedings of the International Workshop on Communication Architecture for Clusters (CAC), held in conjunction with IPDPS 2007, Long Beach, CA, March 2007, p. 234 (2007)
Santos, J.R., Turner, Y., Janakiraman, G(J.), Pratt, I.: Bridging the Gap between Software and Hardware Techniques for I/O Virtualization. In: Proceedings of USENIX 2008 Annual Technical Conference, Boston, MA, June 2008, pp. 29–42 (2008)
Willmann, P., Rixner, S., Cox, A.L.: An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems. In: Proceedings of the USENIX Technical Conference, Boston, MA, pp. 91–96 (2006)
Yi, Z., Waskiewicz, P.P.: Enabling Linux Network Support of Hardware Multiqueue Devices. In: Proceedings of the Linux Symposium (OLS 2007), Ottawa, Canada, June 2007, pp. 305–310 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goglin, B. (2009). NIC-Assisted Cache-Efficient Receive Stack for Message Passing over Ethernet. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_98
Download citation
DOI: https://doi.org/10.1007/978-3-642-03869-3_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)