Evaluating Performance of Task and Data Coarsening in Concurrent Collections

Liu, Chenyang; Kulkarni, Milind

doi:10.1007/978-3-319-52709-3_24

Evaluating Performance of Task and Data Coarsening in Concurrent Collections

Chenyang Liu¹⁶ &
Milind Kulkarni¹⁶

Conference paper
First Online: 24 January 2017

922 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Abstract

Programmers are faced with many challenges for obtaining performance on machines with increasingly capable, yet increasingly complex hardware. A trend towards task-parallel and asynchronous many-task programming models aim to alleviate the burden of parallel programming on a vast array of current and future platforms. One such model, Concurrent Collections (CnC), provides a programming paradigm that emphasizes the separation of the concerns–domain experts concentrate on their algorithms and correctness, whereas performance experts handle mapping and tuning to a target platform. Deep understanding of parallel constructs and behavior is not necessary to write parallel applications that will run on various multi-threaded and multi-core platforms when using the CnC model. However, performance can vary greatly depending on the granularity of tasks and data declared by the programmer. These program-specific decisions are not part of the CnC tuning capabilities and must be tuned in the program. We analyze the performance behavior based on tuning various elements in each collection for the LULESH application using CnC. We demonstrate the effects of different techniques to modify task and data granularity in CnC collections. Our fully tiled CnC implementation outperforms the OpenMP counterpart by 3\(\times \) for 48 processors. Finally, we propose guidelines to emulate the techniques used to obtain high performance while improving programmability.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., et al.: Lulesh programming model and performance ports overview. Technical report, Lawrence Livermore National Laboratory (LLNL), Livermore, CA (2012)
Google Scholar
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Livermore, CA, August 2013
Google Scholar
OpenMP C and C++ Application Program Interface (2002)
Google Scholar
Budimlić, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D., Sarkar, V., Schlimbach, F., et al.: Concurrent collections. Sci. Program. 18(3–4), 203–217 (2010)
Google Scholar
Burke, M.G., Knobe, K., Newton, R., Sarkar, V.: Concurrent collections programming model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 364–371. Springer, Heidelberg (2011). doi:10.1007/978-0-387-09766-4_238
Google Scholar
Chatterjee, S., Vrvilo, N., Budimlić, Z., Knobe, K., Sarkar, V.: Declarative tuning for locality in parallel programs. In: Proceedings of the 45th International Conference on Parallel Processing, ICPP 2016, August 2016, to appear
Google Scholar
Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: ACM SIGPLAN Notices, vol. 47, pp. 61–70. ACM (2012)
Google Scholar
Habanero-Rice: Concurrent collections on OCR (2015)
Google Scholar
Frank Schlimbach, I.C.: Intel concurrent collections for C++ for Windows and Linux (2015)
Google Scholar
Liu, C., Kulkarni, M.: Optimizing the LULESH stencil code using concurrent collections. In: Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, p. 5. ACM (2015)
Google Scholar
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)
Google Scholar
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++, vol. 28. ACM (1993)
Google Scholar
Kasahara, H., Obata, M., Ishizaka, K.: Automatic coarse grain task parallel processing on SMP using OpenMP. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J., Pugh, W., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 189–207. Springer, Heidelberg (2001). doi:10.1007/3-540-45574-4_13
Chapter Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: Pluto: a practical and fully automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 2008), Tucson, AZ. Citeseer, June 2008
Google Scholar
Kong, M., Pop, A., Pouchet, L.N., Govindarajan, R., Cohen, A., Sadayappan, P.: Compiler/runtime framework for dynamic dataflow parallelization of tiled programs. ACM Trans. Archit. Code Optim. 11(4), 61:1–61:30 (2015)
Article Google Scholar
Sbirlea, A., Pouchet, L.N., Sarkar, V.: DFGR an intermediate graph representation for macro-dataflow programs. In: 2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM), pp. 38–45. IEEE (2014)
Google Scholar

Download references

Acknowledgments

This research is supported by the Department of Energy under contract DE-FC02-12ER26104. We would also like to thank Ellen Porter, Kath Knobe, Nick Vrvilo, and Zoran Budimlic for their comments and feedback during discussions regarding CnC.

Author information

Authors and Affiliations

Purdue University, West Lafayette, Indiana, 47907, USA
Chenyang Liu & Milind Kulkarni

Authors

Chenyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Milind Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chenyang Liu .

Editor information

Editors and Affiliations

University of Rochester , Rochester, New York, USA
Chen Ding
University of Rochester , Rochester, New York, USA
John Criswell
Huawei Inc. , Santa Clara, California, USA
Peng Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Kulkarni, M. (2017). Evaluating Performance of Task and Data Coarsening in Concurrent Collections. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-52709-3_24
Published: 24 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52708-6
Online ISBN: 978-3-319-52709-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics