Skip to main content

Evaluating Performance of Task and Data Coarsening in Concurrent Collections

  • Conference paper
  • First Online:
  • 922 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10136))

Abstract

Programmers are faced with many challenges for obtaining performance on machines with increasingly capable, yet increasingly complex hardware. A trend towards task-parallel and asynchronous many-task programming models aim to alleviate the burden of parallel programming on a vast array of current and future platforms. One such model, Concurrent Collections (CnC), provides a programming paradigm that emphasizes the separation of the concerns–domain experts concentrate on their algorithms and correctness, whereas performance experts handle mapping and tuning to a target platform. Deep understanding of parallel constructs and behavior is not necessary to write parallel applications that will run on various multi-threaded and multi-core platforms when using the CnC model. However, performance can vary greatly depending on the granularity of tasks and data declared by the programmer. These program-specific decisions are not part of the CnC tuning capabilities and must be tuned in the program. We analyze the performance behavior based on tuning various elements in each collection for the LULESH application using CnC. We demonstrate the effects of different techniques to modify task and data granularity in CnC collections. Our fully tiled CnC implementation outperforms the OpenMP counterpart by 3\(\times \) for 48 processors. Finally, we propose guidelines to emulate the techniques used to obtain high performance while improving programmability.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Karlin, I., Bhatele, A., Chamberlain, B.L., Cohen, J., Devito, Z., Gokhale, M., Haque, R., Hornung, R., Keasler, J., Laney, D., et al.: Lulesh programming model and performance ports overview. Technical report, Lawrence Livermore National Laboratory (LLNL), Livermore, CA (2012)

    Google Scholar 

  2. Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Livermore, CA, August 2013

    Google Scholar 

  3. OpenMP C and C++ Application Program Interface (2002)

    Google Scholar 

  4. Budimlić, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D., Sarkar, V., Schlimbach, F., et al.: Concurrent collections. Sci. Program. 18(3–4), 203–217 (2010)

    Google Scholar 

  5. Burke, M.G., Knobe, K., Newton, R., Sarkar, V.: Concurrent collections programming model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 364–371. Springer, Heidelberg (2011). doi:10.1007/978-0-387-09766-4_238

    Google Scholar 

  6. Chatterjee, S., Vrvilo, N., Budimlić, Z., Knobe, K., Sarkar, V.: Declarative tuning for locality in parallel programs. In: Proceedings of the 45th International Conference on Parallel Processing, ICPP 2016, August 2016, to appear

    Google Scholar 

  7. Sbîrlea, A., Zou, Y., Budimlíc, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: ACM SIGPLAN Notices, vol. 47, pp. 61–70. ACM (2012)

    Google Scholar 

  8. Habanero-Rice: Concurrent collections on OCR (2015)

    Google Scholar 

  9. Frank Schlimbach, I.C.: Intel concurrent collections for C++ for Windows and Linux (2015)

    Google Scholar 

  10. Liu, C., Kulkarni, M.: Optimizing the LULESH stencil code using concurrent collections. In: Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, p. 5. ACM (2015)

    Google Scholar 

  11. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 66. IEEE Computer Society Press (2012)

    Google Scholar 

  12. Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++, vol. 28. ACM (1993)

    Google Scholar 

  13. Kasahara, H., Obata, M., Ishizaka, K.: Automatic coarse grain task parallel processing on SMP using OpenMP. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J., Pugh, W., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 189–207. Springer, Heidelberg (2001). doi:10.1007/3-540-45574-4_13

    Chapter  Google Scholar 

  14. Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: Pluto: a practical and fully automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 2008), Tucson, AZ. Citeseer, June 2008

    Google Scholar 

  15. Kong, M., Pop, A., Pouchet, L.N., Govindarajan, R., Cohen, A., Sadayappan, P.: Compiler/runtime framework for dynamic dataflow parallelization of tiled programs. ACM Trans. Archit. Code Optim. 11(4), 61:1–61:30 (2015)

    Article  Google Scholar 

  16. Sbirlea, A., Pouchet, L.N., Sarkar, V.: DFGR an intermediate graph representation for macro-dataflow programs. In: 2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM), pp. 38–45. IEEE (2014)

    Google Scholar 

Download references

Acknowledgments

This research is supported by the Department of Energy under contract DE-FC02-12ER26104. We would also like to thank Ellen Porter, Kath Knobe, Nick Vrvilo, and Zoran Budimlic for their comments and feedback during discussions regarding CnC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chenyang Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Liu, C., Kulkarni, M. (2017). Evaluating Performance of Task and Data Coarsening in Concurrent Collections. In: Ding, C., Criswell, J., Wu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2016. Lecture Notes in Computer Science(), vol 10136. Springer, Cham. https://doi.org/10.1007/978-3-319-52709-3_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52709-3_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52708-6

  • Online ISBN: 978-3-319-52709-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics