Skip to main content

Extending Synchronization Constructs in OpenMP to Exploit Pipeline Parallelism on Heterogeneous Multi-core

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7017))

Abstract

The ability of expressing multiple-levels of parallelism is one of the significant features in OpenMP parallel programming model. However, pipeline parallelism is not well supported in OpenMP. This paper proposes extensions to OpenMP directives, aiming at expressing pipeline parallelism effectively. The extended directives are divided into two groups. One can define the precedence at thread level while the other can define the precedence at iteration level. Through these directives, programmers can establish pipeline model more easily and exploit more parallelism to improve performance. To support these directives, a set of runtime interfaces for synchronization are implemented on the Cell heterogeneous multi-core architecture using signal block communications mechanism. Experimental results indicate that good performance can be obtained from the pipeline scheme proposed in this paper compared to the naive parallel applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. OpenMP Application Program Interface, Version 3.0. OpenMP Architecture Review Board (2008)

    Google Scholar 

  2. Gonzalez, M., Ayguadé, E., Martorell, X., Labarta, J.: Defining and supporting pipelined executions in OpenMP. In: Eigenmann, R., Voss, M.J. (eds.) WOMPAT 2001. LNCS, vol. 2104, pp. 155–169. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Rangan, R., Vachharajani, N., Vachharajani, M., August, D.I.: Decoupled software pipelining with the synchronization array. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 177–188. IEEE Press, ashington, DC (2004)

    Google Scholar 

  4. Syrivelis, D., Lalis, S.: Extracting coarse-grained pipelined parallelism out of sequential applications for parallel processor arrays. In: Berekovic, M., Müller-Schloer, C., Hochberger, C., Wong, S. (eds.) ARCS 2009. LNCS, vol. 5455, pp. 4–15. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Michailidis, P.D., Margaritis, K.G.: Implementing parallel LU factorization with pipelining on a multicore using OpenMP. In: 13th IEEE International Conference on Computational Science and Engineering, pp. 253–260 (2010)

    Google Scholar 

  6. Baudisch, D., Brandt, J., Schneider, K.: Multithreaded code from synchronous programs: Generating software pipelines for OpenMP. In: Methoden und Beschreibungssprachen zur Modellierung und Verifikation (MBMV), Dresden, Germany (2010)

    Google Scholar 

  7. Kurzak, J., Dongarra, J.: QR factorization for the CELL processor. Scientific Programming 17, 31–42 (2009)

    Article  Google Scholar 

  8. Baudisch, D., Brandt, J., Schneider, K.: Multithreaded code from synchronous programs: Extracting independent threads for OpenMP. In: Design, Automation and Test in Europe (DATE), pp. 949–952. European Design and Automation Association (2010)

    Google Scholar 

  9. Teruel, X., Unnikrishnan, P., Martorell, X., et al.: Openmp tasks in ibm XL compilers. In: Proc. of the 2008 Conference of the Center for Advanced Studies on Collaborative Research, pp. 207–221. ACM Press, New York (2008)

    Google Scholar 

  10. Gschwind, M.: Chip multiprocessing and the cell broadband engine. In: CF 2006: Proceedings of the 3rd Conference on Computing Frontiers, pp. 1–8 (2006)

    Google Scholar 

  11. Thies, W., Chandrasekhar, V., Amarasinghe, S.: A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 356–369. IEEE Press, Washington, DC (2007)

    Google Scholar 

  12. Ottoni, G., Rangan, R., Stoler, A., August, D.I.: Automatic thread extraction with decoupled software pipelining. In: Proceedings of the 38th IEEE/ACM International Symposium on Microarchitecture, pp. 105–118. IEEE Press, Washington, DC (2005)

    Google Scholar 

  13. Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 151–162. ACM, New York (2006)

    Google Scholar 

  14. Jin, H., Frumkin, M., Yan, J.: The OpenMP implementation of NAS parallel benchmarks and its performance. NAS Technical Report NAS-99-011, NASA Ames Research Center, Moffett Field, CA(1999)

    Google Scholar 

  15. Ayguade, E., Copty, N., Duran, A., Hoeflinger, J., et al.: A proposal for task parallelism in OpenMP. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 1–12. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Ayguade, E., Martorell, X., Labarta, J., Gonzalez, M., Navarro, N.: Exploiting multiple levels of parallelism in OpenMP: a case study. In: 1999 International Conference on Parallel Processing (ICPP), pp. 172–180 (1999)

    Google Scholar 

  17. Suess, M., Leopold, C.: Implementing data-parallel patterns for shared memory with OpenMP. In: Proceedings of the International Conference on Parallel Computing (PARCO). IOS Press, Amsterdam (2008)

    Google Scholar 

  18. Cao, Q., Hu, C., He, H., Huang, X., Li, S.: Support for OpenMP tasks on cell architecture. In: Hsu, C.-H., Yang, L.T., Park, J.H., Yeo, S.-S. (eds.) ICA3PP 2010. LNCS, vol. 6082, pp. 308–317. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  19. Altevogt, P., Boettiger, H., Kiss, T., et al: IBM BladeCenter QS21 hardware performance, IBM Technical White Paper WP101245 [R], USA (2008)

    Google Scholar 

  20. SPEC: Standard Performance Evaluation Corporation, http://www.spec.org

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, S., Yao, S., He, H., Sun, L., Chen, Y., Peng, Y. (2011). Extending Synchronization Constructs in OpenMP to Exploit Pipeline Parallelism on Heterogeneous Multi-core. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2011. Lecture Notes in Computer Science, vol 7017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24669-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24669-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24668-5

  • Online ISBN: 978-3-642-24669-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics