Abstract
As a thread-level automatic parallelization technique, thread-level speculation (TLS) can partition irregular serial programs into multiple threads and implement these threads in parallel on multi-core architectures to improve the performance of programs. To tackle the problem that the conventional heuristic rule-based (HR-based) thread partition approach partitions programs of different characteristics with the same scheme and several programs have bad partition results, this paper proposes a program characteristic-based thread partition approach (ProCTA), which uses a machine learning method to learn the knowledge of thread partition from TLS sample set and predicts thread partition schemes for unknown programs in accordance with programs’ characteristics and finally applies the schemes to thread partition. In Prophet compilation system, Olden benchmarks are used to evaluate ProCTA, and a comparison is made between ProCTA and conventional heuristic rules-based partition approach. The experimental results show that the proposed approach can deliver an average 18.24% speedup improvement than HR-based thread partition approach.
Similar content being viewed by others
References
Carlisle MC (1996) Olden: parallelizing programs with dynamic data structures on distributed-memory machines. PhD thesis, Princeton University
Chen Z, Zhao YL, Pan XY, Dong ZY, Gao B, Zhong ZW (2009) An overview of prophet. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 396–407
Codrescu L, Wills DS (1999) On dynamic speculative thread partitioning and the MEM-slicing algorithm, pp 40–46
Dong Z, Zhao Y, Wei Y, Wang X, Song S (2009) Prophet: a speculative multi-threading execution model with architectural support based on CMP. In: International Conference on Scalable Computing and Communications; Eighth International Conference on Embedded Computing. SCALCOM-EMBEDDEDCOM’09. IEEE, pp 103–108
Franklin M (1995) Multiscalar processors. ACM Sigarch Comput Archit News 23(2):414–425
Grewe D, Wang Z, O’Boyle MFP (2011) A workload-aware mapping approach for data-parallel programs. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, pp 117–126
Hammond L, Hubbert BA, Siu M, Prabhu MK (2000) The stanford hydra CMP. IEEE Micro 20(2):71–84
Johnson TA, Eigenmann R, Vijaykumar TN (2004) Min-cut program decomposition for thread-level speculation. In: ACM Sigplan Notices, vol 39. ACM, pp 59–70
Li D-C, Lin Y-S (2006) Using virtual sample generation to build up management knowledge in the early manufacturing stages. Eur J Oper Res 175(1):413–434
Li Y, Zhao Y, Sun L, Shen M (2017) Optimizing partition thresholds in speculative multithreading. ICIC Express Lett 11(6):1053–1061
Li Y, Zhao Y, Shi J (2016) A hybrid samples generation approach in speculative multithreading. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 35–41
Li Y, Zhao Y, Sun L, Shen M (2017) A hybrid sample generation approach in speculative multithreading. J Supercomput 3:1–33
Li Y, Zhao Y, Qiangsheng W (2017) Gba:a graph based thread partition approach in speculative multithreading. Concurr Comput Pract Exp 29(21):e4294
Liu B, Zhao Y, Li M, Liu Y, Feng B (2012) A virtual sample generation approach for speculative multithreading using feature sets and abstract syntax trees. In: 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies. IEEE, pp 39–44
Liu B, Zhao Y, Li Y, Sun Y, Feng B (2014) A thread partitioning approach for speculative multithreading. J Supercomput 67(3):778–805
Liu B, Zhao Y, Li Y, Sun Y, Feng B (2014) A thread partitioning approach for speculative multithreading. J Supercomput 67(3):778–805
Liu W, Tuck J, Ceze L, Ahn W, Strauss K, Renau J, Torrellas J (2006) Posh: a TLS compiler that exploits program structure. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 158–167
Madriles C, García-Quiñones C, Sánchez J, Marcuello P, González A, Tullsen DM, Wang H, Shen JP (2008) Mitosis: a speculative multithreaded processor based on precomputation slices. IEEE Trans Parallel Distrib Syst 19(7):914–925
Ohsawa T, Takagi M, Kawahara S, Matsushita S (2005) Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities. In: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, pp 81–92
Prabhu MK, Olukotun K (2005) Exposing speculative thread parallelism in spec2000. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, pp 142–152
Quinones CG, Madriles C, Sanchez J, Marcuello P, Gonzalez A, Tullsen DM (2005) Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In: ACM Sigplan Notices, vol 40. ACM, pp 269–279
Quinones CG, Madriles C, Sanchez J, Marcuello P, Gonzalez A, Tullsen DM (2005) Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In: ACM Sigplan Conference on Programming Language Design & Implementation
Rogers A, Carlisle MC, Reppy JH, Hendren LJ (1995) Supporting dynamic data structures on distributed-memory machines. ACM Trans Program Lang Syst 17(2):233–263
Song S, Zhao Y, Feng B, Wei Y, Wang X, Zhao H (2010) Prophet+: an extended multicore simulator for speculative multithreading. J Xian Jiaotong Univ 44(10):13–15
Steffan JG, Colohan C, Zhai A, Mowry TC (2005) The stampede approach to thread-level speculation. ACM Trans Comput Syst 23(3):253–300
Wang Z, Zhao Y, Liu Y, Chen Z, Lv C, Li Y (2017) A speculative parallel decompression algorithm on apache spark. J Supercomput 73(9):1–30
Wang Z, Zhao Y, Liu Y, Lv C (2018) A speculative parallel simulated annealing algorithm based on apache spark. Concurr Comput Pract Exp 1:e4429
Wilson RP, French RS, Wilson CS, Amarasinghe SP, Anderson JM, Tjiang SWK, Liao S-W, Tseng C-W, Hall MW, Lam MS et al (1994) SUIF: an infrastructure for research on parallelizing and optimizing compilers. ACM Sigplan Notices 29(12):31–37
Acknowledgements
We thank all members of Henan Joint International Research Laboratory of Cyberspace Security Applications for their great support and give our best hope to them for their collaboration. We also thank reviewers for their careful comments and suggestions. This work is supported by National Natural Science Foundation of China Grant Nos. 61772174 and 61370220, and Plan For Scientific Innovation Talent of Henan Province Grant No. 174200510011, as well as Program for Innovative Research Team (in Science and Technology) in University of Henan Province Grant No. 15IRTSTHN010. This work is also supported by National Key R&D Plan under Grant No. 2016YFE0104600.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Zhang, Z., Zhang, L. et al. ProCTA: program characteristic-based thread partition approach. J Supercomput 75, 7366–7390 (2019). https://doi.org/10.1007/s11227-019-02943-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02943-1