Abstract
Parallelization constructs in OpenMP, such as parallel for or taskloop, are typically restricted to loops that have no loop-carried dependencies (DOALL) or that contain well-known structured dependence patterns (e.g. reduction). These restrictions prevent the parallelization of many computational intensive may DOACROSS loops. In such loops, the compiler cannot prove that the loop is free of loop-carried dependencies, although they may not exist at runtime. This paper proposes a new clause for taskloop that enables speculative parallelization of may DOACROSS loops: the tls clause. We also present an initial evaluation that reveals that: (a) for certain loops, slowdowns using DOACROSS techniques can be transformed in speed-ups of up to \(2.14\times \) by applying speculative parallelization of tasks; and (b) the scheduling of tasks implemented in the Intel OpenMP runtime exacerbates the ratio of order inversion aborts after applying the taskloop-tls parallelization to a loop.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aldea, S., Estebanez, A., Llanos, D.R., Gonzalez-Escribano, A.: An OpenMP extension that supports thread-level speculation. IEEE Trans. Parallel Distrib. Syst. 27(1), 78–91 (2016)
Ayguade, E., et al.: The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. (TPDS) 20(3), 404–418 (2009)
Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: International Conference on Parallel Processing (ICPP), pp. 836–844 (1986)
Etsion, Y., et al.: Task superscalar: an out-of-order task pipeline. In: International Symposium on Microarchitecture, Washington, DC, USA, pp. 89–100 (2010)
cTuning Foundation: Cbench: collective benchmarks (2016). http://ctuning.org/cbench
Herlihy, M., Moss, J.E.: Transactional memory: architectural support for lock-free data structures. In: International Symposium on Computer Architecture (ISCA), San Diego, CA, USA, pp. 289–300, May 1993
IBM: IBM XL C/C++ for Blue Gene/Q, V12.1 Compiler Reference (2012). http://www-01.ibm.com/support/docview.wss?uid=swg27027065&aid=1
Intel Corporation: Intel architecture instruction set extensions programming reference. Chapter 8: Intel transactional synchronization extensions (2012)
Lamport, L.: The parallel execution of do loops. Commun. ACM 17(2), 83–93 (1974)
Mattos, L., Cesar, D., Salamanca, J., de Carvalho, J.P.L., Pereira, M., Araujo, G.: Doacross parallelization based on component annotation and loop-carried probability. In: International Symposium on Computer. Architecture and High Performance Computing (SBAC-PAD), Lyon, France, pp. 29–32 (2018)
Moore, K.E., Bobba, J., Moravan, M.J., Hill, M.D., Wood, D.A.: LogTM: log-based transactional memory. In: High-Performance Computer Architecture (HPCA), pp. 254–265 (2006)
Murphy, N., Jones, T., Mullins, R., Campanoni, S.: Performance implications of transient loop-carried data dependences in automatically parallelized loops. In: International Conference on Compiler Construction (CC), Barcelona, Spain, pp. 23–33 (2016)
OpenMP-ARB: OpenMP application program interface version 4.5 (2015)
OpenMP-ARB: OpenMP application program interface version 5.0 (2018)
Ottoni, G., Rangan, R., Stoler, A., August, D.I.: Automatic thread extraction with decoupled software pipelining. In: International Symposium on Microarchitecture (MICRO), p. 12, November 2005
Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: 2008 IEEE International Conference on Cluster Computing, Tsukuba, Japan, pp. 142–151 (2008)
Podobas, A., Karlsson, S.: Towards unifying OpenMP under the task-parallel paradigm. In: International Workshop on OpenMP (IWOMP), Nara, Japan, pp. 116–129 (2016)
Salamanca, J., Amaral, J.N., Araujo, G.: Evaluating and improving thread-level speculation in hardware transactional memories. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, USA, pp. 586–595 (2016)
Salamanca, J., Amaral, J.N., Araujo, G.: Using hardware-transactional-memory support to implement thread-level speculation. IEEE Trans. Parallel Distrib. Syst. 29(2), 466–480 (2018)
Salamanca, J., Amaral, J.N., Araujo, G.: Performance evaluation of thread-level speculation in off-the-shelf hardware transactional memories. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 607–621. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64203-1_44
Sohi, G.S., Breach, S.E., Vijaykumar, T.N.: Multiscalar processors. In: International Symposium on Computer Architecture (ISCA), Santa Margherita Ligure, Italy, pp. 414–425 (1995)
Steffan, J., Mowry, T.: The potential for using thread-level data speculation to facilitate automatic parallelization. In: High-Performance Computer Architecture (HPCA), Washington, USA, pp. 2–13 (1998)
Steffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C.: A scalable approach to thread-level speculation. In: International Conference on Computer Architecture (ISCA), Vancouver, British Columbia, Canada, pp. 1–12 (2000)
Teruel, X., Klemm, M., Li, K., Martorell, X., Olivier, S.L., Terboven, C.: A proposal for task-generating loops in OpenMP*. In: International Workshop on OpenMP (IWOMP), Camberra, Australia (2013)
Torrellas, J.: Speculation, thread-level. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1894–1900. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4_170
Acknowledgments
The authors would like to thank the anonymous reviewers for the insightful comments. This work is supported by FAPESP (grants 18/07446-8 and 18/15519-5).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Salamanca, J., Baldassin, A. (2019). A Proposal for Supporting Speculation in the OpenMP taskloop Construct. In: Fan, X., de Supinski, B., Sinnen, O., Giacaman, N. (eds) OpenMP: Conquering the Full Hardware Spectrum. IWOMP 2019. Lecture Notes in Computer Science(), vol 11718. Springer, Cham. https://doi.org/10.1007/978-3-030-28596-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-28596-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28595-1
Online ISBN: 978-3-030-28596-8
eBook Packages: Computer ScienceComputer Science (R0)