skip to main content
10.1145/2831129.2831131acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A data streaming model in MPI

Published: 15 November 2015 Publication History

Abstract

Data streaming model is an effective way to tackle the challenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is necessary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional operations required in streaming models. In this work, we designed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely depends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data producers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming library for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations.

References

[1]
M. Besta and T. Hoefler. Active access: A mechanism for high-performance distributed data-centric computations. In Proceedings of the 2015 ACM International Conference on Supercomputing (ICS'15)(Jun. 2015), ACM.
[2]
S. Byna, R. Sisneros, K. Chadalavada, and Q. Koziol. Tuning parallel I/O on Blue Waters for writing 10 trillion particles.
[3]
W. Gropp and E. Lusk. User's guide for MPE: Extensions for MPI programs. Argonne National Laboratory, 1998.
[4]
W. Gropp, E. Lusk, and R. Thakur. Using MPI-2: Advanced features of the message-passing interface. MIT press, 1999.
[5]
T. Hoefler, A. Lumsdaine, and J. Dongarra. Towards efficient mapreduce using MPI. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 240--249. Springer, 2009.
[6]
X. Lu, F. Liang, B. Wang, L. Zha, and Z. Xu. DataMPI: extending MPI to hadoop-like big data computing. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 829--838. IEEE, 2014.
[7]
E. P. Mancini, G. Marsh, and D. K. Panda. An MPI-stream hybrid programming model for computational clusters. In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 323--330. IEEE, 2010.
[8]
S. Markidis, G. Lapenta, and Rizwan-uddin. Multi-scale simulations of plasma with iPIC3D. Mathematics and Computers in Simulation, 80(7):1509--1519, 2010.
[9]
S. Markidis, J. Vencels, I. B. Peng, D. Akhmetova, E. Laure, and P. Henri. Idle waves in high-performance computing. Physical Review E, 91(1):013306, 2015.
[10]
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. 1995.
[11]
R. Morris. Counting large numbers of events in small registers. Communications of the ACM, 21(10):840--842, 1978.
[12]
J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. Theoretical computer science, 12(3):315--323, 1980.
[13]
I. B. Peng, S. Markidis, and E. Laure. The cost of synchronizing imbalanced processes in message passing systems. In IEEE Cluster 2014 Conference. IEEE, 2015.
[14]
I. B. Peng, S. Markidis, A. Vaivads, J. Vencels, J. Amaya, A. Divin, E. Laure, and G. Lapenta. The formation of a magnetosphere with implicit particle-in-cell simulations. Procedia Computer Science, 51:1178--1187, 2015.
[15]
I. B. Peng, J. Vencels, G. Lapenta, A. Divin, A. Vaivads, E. Laure, and S. Markidis. Energetic particles in magnetotail reconnection. Journal of Plasma Physics, 81(02):325810202, 2015.
[16]
S. J. Plimpton and K. D. Devine. MapReduce in MPI for large-scale graph algorithms. Parallel Computing, 37(9):610--632, 2011.
[17]
D. A. Reed and J. Dongarra. Exascale computing and big data. Communications of the ACM, 58(7):56--68, 2015.
[18]
W. Thies, M. Karczmarek, and S. Amarasinghe. Streamit: A language for streaming applications. In Compiler Construction, pages 179--196. Springer, 2002.
[19]
T. Von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active messages: a mechanism for integrated communication and computation, volume 20. ACM, 1992.
[20]
J. J. Willcock, T. Hoefler, N. G. Edmonds, and A. Lumsdaine. Active pebbles: parallel programming for data-driven applications. In Proceedings of the international conference on Supercomputing, pages 235--244. ACM, 2011.
[21]
R. Williams, I. Gorton, P. Greenfield, and A. Szalay. Guest editors' introduction: Data-intensive computing in the 21st century. Computer, 41(4):0030--32, 2008.
[22]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, volume 10, page 10, 2010.
[23]
X. Zhao, P. Balaji, W. Gropp, and R. Thakur. MPI-interoperable generalized active messages. In Parallel and Distributed Systems (ICPADS), 2013 International Conference on, pages 200--207. IEEE, 2013.

Cited By

View all
  • (2024)HStream: A hierarchical data streaming engine for high-throughput scientific applicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673150(231-240)Online publication date: 12-Aug-2024
  • (2024)Improving MPI Language Support Through Custom Datatype SerializationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00062(414-424)Online publication date: 17-Nov-2024
  • (2022)Revisiting the Design of Parallel Stream Joins on Trusted Execution EnvironmentsAlgorithms10.3390/a1506018315:6(183)Online publication date: 25-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ExaMPI '15: Proceedings of the 3rd Workshop on Exascale MPI
November 2015
51 pages
ISBN:9781450339988
DOI:10.1145/2831129
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPC
  2. MPI
  3. data-intensive
  4. streaming model

Qualifiers

  • Research-article

Funding Sources

  • European Commission through the EPiGRAM project

Conference

SC15
Sponsor:

Acceptance Rates

ExaMPI '15 Paper Acceptance Rate 5 of 11 submissions, 45%;
Overall Acceptance Rate 5 of 11 submissions, 45%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)HStream: A hierarchical data streaming engine for high-throughput scientific applicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673150(231-240)Online publication date: 12-Aug-2024
  • (2024)Improving MPI Language Support Through Custom Datatype SerializationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00062(414-424)Online publication date: 17-Nov-2024
  • (2022)Revisiting the Design of Parallel Stream Joins on Trusted Execution EnvironmentsAlgorithms10.3390/a1506018315:6(183)Online publication date: 25-May-2022
  • (2022)DSParLib: A C++ Template Library for Distributed Stream ParallelismInternational Journal of Parallel Programming10.1007/s10766-022-00737-250:5-6(454-485)Online publication date: 29-Oct-2022
  • (2021)HFlow: A Dynamic and Elastic Multi-Layered I/O Forwarder2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00064(114-124)Online publication date: Sep-2021
  • (2019)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient BroadcastIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286722230:3(575-588)Online publication date: 1-Mar-2019
  • (2019)Exploring stream parallel patterns in distributed MPI environmentsParallel Computing10.1016/j.parco.2019.03.00484:C(24-36)Online publication date: 1-May-2019
  • (2018)Supporting MPI-distributed stream parallel patterns in GrPPIProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236380(1-10)Online publication date: 23-Sep-2018
  • (2018)The SAGE project: a storage centric approach for exascale computingProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3205341(287-292)Online publication date: 8-May-2018
  • (2018)Performance Factor Analysis and Scope of Optimization for Big Data Processing on Cluster2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC.2018.8745857(418-423)Online publication date: Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media