ABSTRACT
We examine natural expectations on communication performance using MPI derived datatypes in comparison to the baseline, "raw" performance of communicating simple, noncontiguous data layouts. We show that common MPI libraries sometimes violate these datatype performance expectations, and discuss reasons why this happens, but also show cases where MPI libraries perform well. Our findings are in many ways surprising and disappointing. First, the performance of derived datatypes is sometimes worse than the semantically equivalent packing and unpacking using the corresponding MPI functionality. Second, the communication performance equivalence stated in the MPI standard between a single contiguous datatype and the repetition of its constituent datatype does not hold universally. Third, the heuristics that are typically employed by MPI libraries at type-commit time are insufficient to enforce natural performance guidelines, and better type normalization heuristics may have a significant performance impact. We show cases where all the MPI type constructors are necessary to achieve the expected performance for certain data layouts. We describe our benchmarking approach to verify the datatype performance guidelines, and present extensive verification results for different MPI libraries.
- S. Byna, W. D. Gropp, X.-H. Sun, and R. Thakur. Improving the performance of MPI derived datatypes by optimizing memory-access cost. In CLUSTER, pages 412--419, 2003.Google ScholarCross Ref
- A. Carpen-Amarie, S. Hunold, and J. L. Träff. MPI derived datatypes: Performance expectations and status quo. CoRR, abs/1607.00178, 2016.Google Scholar
- R. Ganian, M. Kalany, S. Szeider, and J. L. Träff. Polynomial-time construction of optimal MPI derived datatype trees. In IPDPS. IEEE Computer Society, 2016.Google ScholarCross Ref
- W. D. Gropp, T. Hoefler, R. Thakur, and J. L. Träff. Performance expectations and guidelines for MPI derived datatypes: a first analysis. In EuroMPI, pages 150--159. Springer, 2011. Google ScholarDigital Library
- T. Hoefler and S. Gottlieb. Parallel zero-copy algorithms for fast fourier transform and conjugate gradient using MPI datatypes. In EuroPVM/MPI, pages 132--141, 2010. Google ScholarDigital Library
- S. Hunold, A. Carpen-Amarie, and J. L. Träff. Reproducible MPI micro-benchmarking isn't as easy as you think. In EuroMPI/ASIA, pages 69--76. ACM, 2014. Google ScholarDigital Library
- M. Kalany and J. L. Träff. Efficient, optimal MPI datatype reconstruction for vector and index types. In EuroMPI. ACM, 2015. Google ScholarDigital Library
- F. Kjolstad, T. Hoefler, and M. Snir. A transformation to convert packing code to compact datatypes for efficient zero-copy data transfer. Technical report, University of Illinois at Urbana-Champain, 2011. Retrieved from http://hdl.handle.net/2142/26452, last visited on 03/01/2016.Google Scholar
- F. Kjolstad, T. Hoefler, and M. Snir. Automatic datatype generation and optimization. In PPoPP, pages 327--328, 2012. Google ScholarDigital Library
- MPI Forum. MPI: A Message-Passing Interface Standard. Version 3.1, June 4th 2015. www.mpi-forum.org.Google Scholar
- T. Prabhu and W. Gropp. DAME: A runtime-compiled engine for derived datatypes. In EuroMPI, 2015. Google ScholarDigital Library
- R. Reussner, J. L. Träff, and G. Hunzelmann. A benchmark for MPI derived datatypes. In EuroPVM/MPI, pages 10--17. Springer, 2000. Google ScholarDigital Library
- R. Ross, N. Miller, and W. D. Gropp. Implementing fast and reusable datatype processing. In EuroPVM/MPI, pages 404--413. Springer, 2003.Google ScholarCross Ref
- R. B. Ross, R. Latham, W. Gropp, E. L. Lusk, and R. Thakur. Processing MPI datatypes outside MPI. In EuroPVM/MPI, pages 42--53, 2009. Google ScholarDigital Library
- T. Schneider, R. Gerstenberger, and T. Hoefler. Application-oriented ping-pong benchmarking: how to assess the real communication overheads. Computing, 96(4):279--292, 2014. Google ScholarDigital Library
- T. Schneider, F. Kjolstad, and T. Hoefler. MPI datatype processing using runtime compilation. In EuroMPI, pages 19--24, 2013. Google ScholarDigital Library
- M. Schulz, G. Bronevetsky, and B. R. de Supinski. On the performance of transparent MPI piggyback messages. In EuroPVM/MPI, pages 194--201, 2008. Google ScholarDigital Library
- J. L. Träff. Optimal MPI datatype normalization for vector and index-block types. In EuroMPI/ASIA, pages 33--38. ACM, 2014. Google ScholarDigital Library
- J. L. Träff, W. D. Gropp, and R. Thakur. Self-consistent MPI performance guidelines. IEEE TPDS, 21(5):698--709, 2010. Google ScholarDigital Library
- J. L. Träff, R. Hempel, H. Ritzdorf, and F. Zimmermann. Flattening on the fly: efficient handling of MPI derived datatypes. In EuroPVM/MPI, pages 109--116. Springer, 1999. Google ScholarDigital Library
- J. Wu, P. Wyckoff, and D. K. Panda. High performance implementation of MPI derived datatype communication over InfiniBand. In IPDPS, page 14, 2004.Google Scholar
- Y. Wu, J. Song, K. Ren, and X. Li. MPI derived datatypes and data communication analysis in meteorological applications. In 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pages 536--541, 2015.Google ScholarCross Ref
Recommendations
MPI Derived Datatypes: Performance and Portability Issues
EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingThis paper addresses performance-portability and overall performance issues when derived datatypes are used with four MPI implementations: Open MPI, MPICH, MVAPICH2, and Intel MPI. These comparisons are particularly relevant today since most vendor ...
A case for using MPI's derived datatypes to improve I/O performance
SC '98: Proceedings of the 1998 ACM/IEEE conference on SupercomputingMPI-IO, the I/O part of the MPI-2 standard, is a promising new interface for parallel I/O. A key feature of MPI-IO is that it allows users to access several noncontiguous pieces of data from a file with a single I/O function call by defining file views ...
A Benchmark for MPI Derived Datatypes
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing InterfaceWe present an extension of the SKaMPI benchmark for MPI implementations to cover the derived datatype mechanism of MPI. All MPI constructors for derived datatypes are covered by the benchmark, and varied along different dimensions. This is controlled ...
Comments