Skip to main content

INAM2: InfiniBand Network Analysis and Monitoring with MPI

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9697))

Included in the following conference series:

Abstract

Modern high-end computing is being driven by the tight integration of several hardware and software components. On the hardware front, there are the multi-/many-core architectures (including accelerators and co-processors) and high-end interconnects like InfiniBand that are continually pushing the envelope of raw performance. On the software side, there are several high performance implementations of popular parallel programming models that are designed to take advantage of the high-end features offered by the hardware components and deliver multi-petaflop level performance to end applications. Together, these components allow scientists and engineers to tackle grand challenge problems in their respective domains.

Understanding and gaining insights into the performance of end applications on these modern systems is a challenging task. Several researchers and hardware manufacturers have attempted to tackle this by designing tools to inspect the network level or MPI level activities. However, all existing tools perform the inspection in a disjoint fashion and are unable to correlate the data generated by profiling the network and MPI. This results in a loss of valuable information that can provide the insights required for understanding the performance of High-End Computing applications. In this paper, we take up this challenge and design InfiniBand Network Analysis and Monitoring with MPI - \(INAM^{2}\). INAM\(^{2}\) allows users to analyze and visualize the communication happening in the network in conjunction with data obtained from the MPI library. Our experimental analysis shows that the INAM\(^{2}\) is able to profile and visualize the communication with very low performance overhead at scale.

This research is supported in part by National Science Foundation grants #CCF-1213084, #CNS-1419123, #CNS-1513120, #ACI-1450440 and #IIS-1447804.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ganglia Cluster Management System. http://ganglia.sourceforge.net/

  2. Integrated Performance Monitoring (IPM). http://ipm-hpc.sourceforge.net/

  3. mpiP: Lightweight, Scalable MPI Profiling. http://www.llnl.gov/CASC/mpip/

  4. Nagios. http://www.nagios.org/

  5. Malony, A.D., Shende, S.: Performance technology for complex parallel and distributed systems. In: Kotsis, G., Kacsuk, P. (eds.) Proceedings of DAPSYS, pp. 37–46 (2000)

    Google Scholar 

  6. Agelastos, A., Allan, B., Brandt, J., Cassella, P., Enos, J., Fullop, J., Gentile, A., Monk, S., Naksinehaboon, N., Ogden, J., Rajan, M., Showerman, M., Stevenson, J., Taerat, N., Tucker, T.: The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 154–165. IEEE Press, Piscataway, NJ, USA (2014)

    Google Scholar 

  7. HOPSA Holistic Performance System Analysis. http://www.vi-hps.org/projects/hopsa/overview

  8. OSU InfiniBand Network Analysis and Monitoring. http://mvapich.cse.ohio-state.edu/tools/osu-inam/

  9. Bailey, D.H., Barszcz, E., Dagum, L., Simon, H.D.: NAS parallel benchmark results. Technical report 94-006, RNR (1994)

    Google Scholar 

  10. PAVE Software Boxfish. https://computation.llnl.gov/project/performance-analysis-through-visualization/software.php

  11. Coarray Fortran (CAF). http://caf.rice.edu/

  12. Open MPI: Open Source High Performance Computing. http://www.open-mpi.org

  13. Intel Corporation. Intel VTune Amplifier. https://software.intel.com/en-us/intel-vtune-amplifier-xe

  14. Gallardo, E., Vienne, J., Fialho, L., Teller, P., Browne, J.: MPI advisor: a minimal overhead MPI performance tuning tool. In: EuroMPI 2015 (2015)

    Google Scholar 

  15. Spring Framework. http://projects.spring.io/spring-framework/

  16. Pfister, G.: Aspects of the InfiniBand architecture. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 369. IEEE Computer Society (2001)

    Google Scholar 

  17. Apache Hadoop. https://hadoop.apache.org/

  18. HPCToolkit. http://hpctoolkit.org/

  19. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of 6th Conference on Symposium on Opearting Systems Design and Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley, CA, USA (2004)

    Google Scholar 

  20. Asynchronous JavaScript and XML. http://www.w3schools.com/Ajax/ajax_intro.asp

  21. Jquery. https://jquery.com/

  22. Koop, M., Jones, T., Panda, D.K.: MVAPICH-Aptus: scalable high-performance multi-transport MPI over InfiniBand. In: IPDPS 2008, pp. 1–12 (2008)

    Google Scholar 

  23. Koop, M., Sridhar, J., Panda, D.K.: Scalable MPI design over InfiniBand using extended reliable connection. In: IEEE International Conference on Cluster Computing (Cluster 2008), September 2008

    Google Scholar 

  24. Koop, M., Sur, S., Gao, Q., Panda, D.K.: High performance, MPI design using unreliable datagram for ultra-scale infiniband clusters. In: ICS 2007: Proceedings of the 21st Annual International Conference on Supercomputing, pp. 180–189. ACM, New York, NY, USA (2007)

    Google Scholar 

  25. Liu, J., Jiang, W., Wyckoff, P., Panda, D.K., Ashton, D., Buntinas, D., Gropp, W., Toonen, B.: Design and implementation of MPICH2 over InfiniBand with RDMA support. In: Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2004), April 2004

    Google Scholar 

  26. Schulz, M., MPIT: a new interface for performance tools in MPI 3. http://cscads.rice.edu/workshops/summer-2010/slides/performance-tools/2010-08-cscads-mpit.pdf

  27. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, March 1994

    Google Scholar 

  28. MVAPICH2-X: Unified MPI+PGAS Communication Runtime over OpenFabrics/Gen2 for Exascale Systems. http://mvapich.cse.ohio-state.edu/

  29. OpenSHMEM. http://openshmem.org/site/

  30. Apache Spark. http://spark.apache.org/

  31. TACC STATS. https://www.tacc.utexas.edu/research-development/tacc-projects/tacc-stats

  32. Subramoni, H., Hamidouche, K., Venkatesh, A., Chakraborty, S., Panda, D.K.: Designing MPI library with dynamic connected transport (DCT) of InfiniBand: early experiences. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 278–295. Springer, Heidelberg (2014)

    Google Scholar 

  33. Top 500 Supercomputers. http://www.top500.org/statistics/list/

  34. Mellanox Technologies. Mellanox Integrated Switch Management Solution. http://www.mellanox.com/page/ib_fabricit_efm_management

  35. Unified Parallel C (UPC). http://upc.lbl.gov/

Download references

Acknowledgements

We would like to thank Michael Knox from Cray and John Hanks from KAUST for their feedback on the OSU INAM package and thus enabling us to fix several bugs and performance issues.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hari Subramoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Subramoni, H. et al. (2016). INAM2: InfiniBand Network Analysis and Monitoring with MPI. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9697. Springer, Cham. https://doi.org/10.1007/978-3-319-41321-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41321-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41320-4

  • Online ISBN: 978-3-319-41321-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics