Skip to main content

An OpenMP* Barrier Using SIMD Instructions for Intel® Xeon PhiTM Coprocessor

  • Conference paper
OpenMP in the Era of Low Power Devices and Accelerators (IWOMP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8122))

Included in the following conference series:

Abstract

Barrier synchronisation is a widely-studied topic since the supercomputer era due to its significant impact on the overall performance of parallel applications. With the current shift to many-core architectures, such as the Intel® Many Integrated Core Architecture, software barriers need to be revisited from an on-chip point of view to exploit their new specific resources. In this paper, we propose a tree-based barrier that takes advantage of SIMD instructions and the inter-thread cache locality provided by the 4-way SMT of the Intel® Xeon PhiTM coprocessor. Our SIMD approach shows a speed-up of up to 2.84x over the default Intel OpenMP* barrier in the EPCC barrier microbenchmark. It also improves by up to 60% and 21% the Livermore Loop kernel number six and the NAS MG benchmark, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Balanced affinity type. Intel® C++ Compiler XE 13.1 User and Reference Guides, http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/ (accessed: May 09,2013)

  2. Intel® Xeon PhiTM Coprocessor - The Architecture, http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner (accessed: May 09, 2013)

  3. Intel® Xeon PhiTM Coprocessor Instruction Set Architecture Reference Manual (2012)

    Google Scholar 

  4. Abellán, J.L., Fernández, J., Acacio, M.E.: Efficient and scalable barrier synchronization for many-core CMPs. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, CF 2010, pp. 73–74 (2010)

    Google Scholar 

  5. Almási, G., Heidelberger, P., Archer, C.J., Martorell, X., Erway, C.C., Moreira, J.E., Steinmacher-Burow, B., Zheng, Y.: Optimization of MPI collective communication on BlueGene/L systems. In: Proc. of the 19th Int. Conf. on Supercomp., ICS 2005 (2005)

    Google Scholar 

  6. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks - summary and preliminary results. In: Proc. of the 1991 ACM/IEEE Conf. on Supercomp., SC 1991, pp. 158–165 (1991)

    Google Scholar 

  7. Bull, J.M., Reid, F., McDonnell, N.: A microbenchmark suite for openMP tasks. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 271–274. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Eichenberger, A.E., Abraham, S.G.: Impact of load imbalance on the design of software barriers. In: Proc. of the 1995 Int. Conf. on Parallel Processing, pp. 63–72 (1995)

    Google Scholar 

  9. Gottlieb, A., Grishman, R., Kruskal, C.P., McAuliffe, K.P., Rudolph, L., Snir, M.: The NYU ultracomputer. designing an MIMD shared memory parallel computer. IEEE Transactions on Computers C-32(2), 175–189 (1983)

    Article  Google Scholar 

  10. Gupta, R., Hill, C.R.: A scalable implementation of barrier synchronization using an adaptive combining tree. Internat. Journal of Parallel Programming 18(3), 161–180 (1989)

    Article  Google Scholar 

  11. Gupta, R.: The fuzzy barrier: a mechanism for high speed synchronization of processors. SIGARCH Comput. Archit. News 17(2), 54–63 (1989)

    Article  Google Scholar 

  12. Hoefler, T., Mehlan, T., Mietke, F., Rehm, W.: A survey of barrier algorithms for coarse grained supercomputers chemnitzer informatik berichte (2004)

    Google Scholar 

  13. Huang, W., Stant, M.R., Sankaranarayanan, K., Ribando, R.J., Skadron, K.: Many-core design from a thermal perspective. In: Proceed. of the 45th Annual Design Automation Conference, DAC 2008, pp. 746–749. ACM, New York (2008)

    Chapter  Google Scholar 

  14. McMahon, F.H.: The Livermore Fortran kernels: A computer test of the numerical performance range (1986)

    Google Scholar 

  15. Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)

    Article  Google Scholar 

  16. Nanjegowda, R., Hernandez, O., Chapman, B., Jin, H.H.: Scalability evaluation of barrier algorithms for openMP. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 42–52. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003, p. 55 (2003)

    Google Scholar 

  18. Pfister, G.F., Norton, V.A.: Hot-spot contention and combining in multistage interconnection networks. IEEE Transactions on Computers C-34(10), 943–948 (1985)

    Article  Google Scholar 

  19. Sampson, J., Gonzalez, R., Collard, J., Jouppi, N.P., Schlansker, M., Calder, B.: Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In: Proc. of the 39th Annual IEEE/ACM Int. Symp. on Microarchitecture, MICRO 39, pp. 235–246 (2006)

    Google Scholar 

  20. Sartori, J., Kumar, R.: Low-overhead, high-speed multi-core barrier synchronization. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 18–34. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  21. Scott, M.L., Mellor-Crummey, J.M.: Fast, contention-free combining tree barriers for shared-memory multiprocessors. Int. Journal of Parallel Prog. 22(4), 449–481 (1994)

    Article  Google Scholar 

  22. Scott, S.L.: Synchronization and communication in the T3E multiprocessor. SIGPLAN Not. 31(9), 26–36 (1996)

    Article  Google Scholar 

  23. Villa, O., Palermo, G., Silvano, C.: Efficiency and scalability of barrier synchronization on NoC based many-core architectures. In: Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 81–90 (2008)

    Google Scholar 

  24. Yew, P., Tzeng, N., Lawrie, D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Transactions on Computers C-36(4), 388–395 (1987)

    Google Scholar 

  25. Zhang, G., Martínez, F., Tal, A., Blainey, B.: Busy-wait barrier synchronization using distributed counters with local sensor. In: Proc. of the WOMPAT, pp. 84–98 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Caballero, D., Duran, A., Martorell, X. (2013). An OpenMP* Barrier Using SIMD Instructions for Intel® Xeon PhiTM Coprocessor. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds) OpenMP in the Era of Low Power Devices and Accelerators. IWOMP 2013. Lecture Notes in Computer Science, vol 8122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40698-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40698-0_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40697-3

  • Online ISBN: 978-3-642-40698-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics