ABSTRACT
Recently, high speed interconnects capable of remote direct memory access (RDMA) such as InfiniBand and iWARP have gained considerable popularity due to their superb latency and bandwidth. Most existing studies about RDMA have focused mainly on its performance aspect. However, as power management has become essential for high-end systems such as enterprise servers and high performance computing nodes which are often equipped with RDMA capable network adapters, it is very important for us to take a fresh look at the benefits of RDMA from the power perspective.
In the paper, we provide a detailed empirical study of the benefits of RDMA in terms of power savings compared with traditional communication protocols such as TCP/IP. We used two popular RDMA adapters in our evaluations: Mellanox ConnectX InfiniBand HCAs and Chelsio T3 10GE RNICs. In order to isolate the impact of communication on power consumption, our evaluation focused on using micro-benchmarks which perform different communication patterns. We have also studied several important factors that may have an impact on the performance and the power consumption of RDMA adapters such as the use of polling versus blocking, CPU speeds, and extra memory copies.
We show that using high speed RDMA adapters can result in significant amount of power consumption during communication. (In one test, the system power has increased by as much as 50 watts, or over 30% of the idle power.) We found that RDMA generally has better power efficiency compared to that of TCP/IP, especially for communication intensive phases, for example when large messages are transferred. The power savings of RDMA are achieved by minimizing the interactions between the network adapters and other system components such as the CPUs and the memory: Although nearly the same amount of data must be going through the network adapters for both RDMA and TCP/IP, RDMA requires much fewer CPU cycles for protocol processing and also generates less memory bus traffic, both of which contribute to its power savings.
Overall, our research demonstrated that RDMA not only provides high communication performance, but also offers excellent power efficiency, making it a desirable choice in environments that have strict power/energy constraints and demand high communication performance.
- Intelligent Platform Management Interface Specifications. http://www.intel.com/design/servers/ipmi/spec.htm.Google Scholar
- IP over InfiniBand Working Group. http://www.ietf.org/-html.charters/ipoib-charter.html.Google Scholar
- M. Anand, E. B. Nightingale, and J. Flinn. Self-tuning wireless network power management. In Proceedings of the Ninth Annual International Conference on Mobile Computing and Networking (MOBICOM'03), Sept. 2003. Google ScholarDigital Library
- P. Balaji, W. Feng, and D. Panda. Bridging the Ethernet-Ethernot Performance Gap. IEEE MICRO, 26(3):24, 2006. Google ScholarDigital Library
- P. Balaji, H. Shah, and D. Panda. Sockets vs RDMA Interface over 10-Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck. In RAIT Workshop 2004.Google Scholar
- C. Bell, D. Bonachea, Y. Cote, J. Duell, P. Hargrove, P. Husbands, C. Iancu, M. Welcome, and K. Yelick. An evaluation of current high-performance networks. In International Parallel and Distributed Processing Symposium (IPDPS'03), April 2003. Google ScholarDigital Library
- D. Bertozzi, A. Raghunathan, L. Benini, and S. Ravi. Transport protocol optimization for energy efficient wireless embedded systems. In Proceedings of the Conference on Design Automation and Test in Europe (DATE'03), Mar. 2003. Google ScholarDigital Library
- R. Bianchini and R. Rajamony. Power and Energy Management for Server Systems. COMPUTER, pages 68--76, 2004. Google ScholarDigital Library
- W. Bircher and L. John. Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events. In Performance Analysis of Systems & Software, 2007. ISPASS 2007. IEEE International Symposium on, pages 158--168, 2007.Google Scholar
- R. Brightwell, D. Doerfler, and K. Underwood. A comparison of 4X InfiniBand and Quadrics Elan-4 technologies. In Cluster Computing, 2004 IEEE International Conference on, pages 193--204, 2004. Google ScholarDigital Library
- G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, and J. Wilkes. An Implementation of the Hamlyn Sender-Managed Interface Architecture. ACM SIGOPS Operating Systems Review, 30(si):245--259, 1996. Google ScholarDigital Library
- J. Chase, D. Anderson, P. Thakar, A. Vahdat, and R. Doyle. Managing energy and server resources in hosting centers. In Proceedings of the eighteenth ACM symposium on Operating systems principles, pages 103--116, 2001. Google ScholarDigital Library
- Chelsio Communications. Chelsio 10GbE NICs. http://www.chelsio.com.Google Scholar
- P. Culley, U. Elzur, R. Recio, S. Bailey, and J. Carrier. Marker PDU Aligned Framing for TCP Specification. http://www.rdmaconsortium.org/home/draft-culley-iwarp-mpa-v1.0.pdf.Google Scholar
- D. Dalessandro, P. Wyckoff, O. Center, and O. Springfield. A Performance Analysis of the Ammasso RDMA Enabled Ethernet Adapter and its iWARP API. Cluster Computing, 2005. IEEE International, pages 1--7, 2005.Google ScholarCross Ref
- C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis, and K. Li. VMMC-2: Efficient Support for Reliable, Connection-Oriented Communication. In Proc. Hot Interconnects Conf., Aug, 1997.Google Scholar
- L. Feeney and M. Nilsson. Investigating the energy consumption of a wireless network interface in an ad hoc networking environment. In Proceedings of the Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM'01), Apr. 2001.Google ScholarCross Ref
- W. Feng. Making a Case for Efficient Supercomputing. Queue, 1(7):54--64, 2003. Google ScholarDigital Library
- W. Feng, P. Balaji, C. Baron, L. Bhuyan, D. Panda, C. Sci, and E. Riverside. Performance Characterization of a 10-Gigabit Ethernet TOE. In Proceedings of the IEEE International Symposium on High-Performance Interconnects (HotI), 2005. Google ScholarDigital Library
- V. W. Freeh and D. K. Lowenthal. Using multiple energy gears in MPI programs on a power-scalable cluster. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 164--173, 2005. Google ScholarDigital Library
- V. W. Freeh, F. Pan, N. Kappiah, D. K. Lowenthal, and R. Springer. Exploring the Energy-Time Tradeoff in MPI Programs on a Power-Scalable Cluster. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, pages 4a--4a, 2005. Google ScholarDigital Library
- J. Hilland, P. Culley, J. Pinkerton, and R. Recio. RDMA Protocol Verbs Specification. http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf.Google Scholar
- C. Hsu and W. Feng. A power-aware run-time system for high-performance computing. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005. Google ScholarDigital Library
- J. Hurwitz and W. Feng. End-to-end performance of 10-gigabit Ethernet on commodity systems. Micro, IEEE, 24(1):10--22, 2004. Google ScholarDigital Library
- InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.2.Google Scholar
- C. Isci and M. Martonosi. Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society Washington, DC, USA, 2003. Google ScholarDigital Library
- C. Isci and M. Martonosi. Phase characterization for power: Evaluating control-flow-based and event-counter-based techniques. In Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture (HPCA06), February, 2006.Google ScholarCross Ref
- N. Kappiah, V. W. Freeh, and D. K. Lowenthal. Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs. In Proc. of IEEE/ACM Supercomputing 2005: High Performance Computing, Networking Storage, and Analysis Conference, 2005. Google ScholarDigital Library
- C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, and T. Keller. Energy Management for Commercial Servers. COMPUTER, pages 39--48, 2003. Google ScholarDigital Library
- M. Lim, V. W. Freeh, and D. K. Lowenthal. Adaptive, transparent frequency and voltage scaling of communication phases in mpi programs. IEEE/ACM Supercomputing, November, 2006. Google ScholarDigital Library
- J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. K. Panda. Performance Comparison of MPI Implementations over InfiniBand Myrinet and Quadrics. In Supercomputing 2003: The International Conference for High Performance Computing and Communications, Nov. 2003. Google ScholarDigital Library
- J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, D. Panda, and P. Wyckoff. Microbenchmark Performance Comparison of High--Speed Cluster Interconnects. IEEE MICRO, pages 42--51, 2004. Google ScholarDigital Library
- Mellanox Technologies. http://www.mellanox.com.Google Scholar
- OpenFabrics Alliance. OpenFabrics Alliance Web Site. http://www.openfabrics.org/.Google Scholar
- V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini. DMA-Aware Memory Energy Management. In Proceedings of HPCA, February, 2006.Google ScholarCross Ref
- M. Pettersson. perfctr patches. http://user.it.uu.se/ mikpe/linux/perfctr/.Google Scholar
- J. Pinkerton. The Case for RDMA. http://www.rdmaconsortium.org/.Google Scholar
- M. Rashti and A. Afsahi. 10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G. In Proceedings of the International Workshop on Communication Architecture for Clusters (CAC), held in conjunction with IPDPS, volume 7, page 234.Google Scholar
- R. Recio, P. Culley, D. Garcia, and J. Hilland. An RDMA Protocol Specification (Version 1.0). http://www.rdmaconsortium.org/home/draft-recio-iwarp-rdmap-v1.0.pdf.Google Scholar
- H. Shah, J. Pinkerton, R. Recio, and P. Culley. Direct Data Placement over Reliable Transports. http://www.rdmaconsortium.org/home/draft-shah-iwarp-ddp-v1.0.pdf.Google Scholar
- E. Shih, P. Bahl, and M. J. Sinclair. Wake on wireless: An event driven energy saving strategy for battery operated devices. In Proceedings of the Eighth Annual International Conference on Mobile Computing and Networking (MOBICOM'02), Sept. 2002. Google ScholarDigital Library
- Q. Snell, A. Mikler, and J. Gustafson. NetPIPE: A Network Protocol Independent Performance Evaluator. IASTED International Conference on Intelligent Information Management and Systems, 6, 1996.Google Scholar
- A. Tirumala, F. Qin, J. Dugan, J. Ferguson, and K. Gibbs. Iperf-The TCP/UDP bandwidth measurement tool. URL: http://dast.nlanr.net/Projects/Iperf, 2004.Google Scholar
- R. Zamani, A. Afsahi, Y. Qian, and C. Hamacher. A Feasibility Analysis of Power-Awareness and Energy Minimization in Modern Interconnects for High-Performance Computing. In Proc. of IEEE Cluster 2007, 2007. Google ScholarDigital Library
Index Terms
- Evaluating high performance communication: a power perspective
Recommendations
Scalable connectionless RDMA over unreliable datagrams
We demonstrate non-connection-based RDMA methods for iWARP Ethernet networks.RDMA Write-Record is the first RDMA operation for unreliable transports.The new methods show improved performance and scalability.The methods are proven for both commercial and ...
Memcached Design on High Performance RDMA Capable Interconnects
ICPP '11: Proceedings of the 2011 International Conference on Parallel ProcessingMemcached is a key-value distributed memory object caching system. It is used widely in the data-center environment for caching results of database calls, API calls or any other data. Using Memcached, spare memory in data-center servers can be ...
High performance RDMA-based MPI implementation over infiniBand
Special issue I: The 17th annual international conference on supercomputing (ICS'03)Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) ...
Comments