Using Hardware Counters to Predict Vectorization

Watkinson, Neftali; Shivam, Aniket; Chen, Zhi; Veidenbaum, Alexander; Nicolau, Alexandru; Gong, Zhangxiaowen

doi:10.1007/978-3-030-35225-7_1

Neftali Watkinson⁹,
Aniket Shivam⁹,
Zhi Chen⁹,
Alexander Veidenbaum⁹,
Alexandru Nicolau⁹ &
…
Zhangxiaowen Gong¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11403))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

481 Accesses

Abstract

Vectorization is the process of transforming the scalar implementation of an algorithm into vector form. This transformation aims to benefit from parallelism through the generation of microprocessor vector instructions. Using abstract models and source level information, compilers can identify opportunities for auto-vectorization. However, compilers do not always predict the runtime effects accurately or completely fail to identify vectorization opportunities. This ultimately results in no performance improvement.

This paper takes on a new perspective by leveraging the use of runtime hardware counters to predict the potential for loop vectorization. Using supervised machine learning models, we can detect instances where vectorization can be applied (but the compilers fail to) with 80% validation accuracy. We also predict profitability and performance in different architectures.

We evaluate a wide range of hardware counters across different machine learning models. We show that dynamic features, extracted from performance data, implicitly include useful information about the host machine and runtime program behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards an Achievable Performance for the Loop Nests

Efficient Code Region Characterization Through Automatic Performance Counters Reduction Using Machine Learning Techniques

Unification of Static and Dynamic Analyses to Enable Vectorization

References

Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: an Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 637–646. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54420-0_62
Chapter Google Scholar
Banerjee, U.: An introduction to a formal theory of dependence analysis. J. Supercomput. 2(2), 133–149 (1988)
Article Google Scholar
Cammarota, R., Beni, L.A., Nicolau, A., Veidenbaum, A.V.: Optimizing program performance via similarity, using a feature-agnostic approach. In: Wu, C., Cohen, A. (eds.) APPT 2013. LNCS, vol. 8299, pp. 199–213. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45293-2_15
Chapter Google Scholar
Demšar, J., et al.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14(1), 2349–2353 (2013)
MATH Google Scholar
Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach (2001)
Google Scholar
Maleki, S., Gao, Y., Garzarán, M.J., Wong, T., Padua, D.A.: An evaluation of vectorizing compilers. In: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, pp. 372–382 (2011)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. In: Americas, vol. 32, pp. 2473–10013. Delhi Cambridge University Press (2008)
Google Scholar
Reinders, J.: VTuneTM Performance Analyzer Essentials Measurement and Tuning Techniques for Software Developers (First.). Intel Press (2005)
Google Scholar
Trouvé, A., et al.: Using machine learning in order to improve automatic SIMD instruction generation. Procedia Comput. Sci. 18, 1292–1301 (2013)
Article Google Scholar
Fursin, G., et al.: Milepost GCC: machine learning enabled self-tuning compiler. Int. J. Parallel Prog. 39(3), 296–327 (2011)
Article Google Scholar
Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.M.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: ACM SIGPLAN Notices, pp. 177–187 (2009)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Weaver, V.M.: Linux perf_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, p. 80, April 2013
Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, no. 2, pp. 1137–1145, August 1995
Google Scholar
Chen, Z., et al.: LORE: a loop repository for the evaluation of compilers. In: 2017 IEEE International Symposium on Workload Characterization (in press)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Award 1533912.

Author information

Authors and Affiliations

Department of Computer Science, University of California, Irvine, Irvine, USA
Neftali Watkinson, Aniket Shivam, Zhi Chen, Alexander Veidenbaum & Alexandru Nicolau
Department of Computer Science, University of Illinois, Urbana-Champaign, Champaign, USA
Zhangxiaowen Gong

Authors

Neftali Watkinson
View author publications
You can also search for this author in PubMed Google Scholar
Aniket Shivam
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Veidenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Nicolau
View author publications
You can also search for this author in PubMed Google Scholar
Zhangxiaowen Gong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neftali Watkinson .

Editor information

Editors and Affiliations

Texas A&M University, College Station, TX, USA
Lawrence Rauchwerger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Watkinson, N., Shivam, A., Chen, Z., Veidenbaum, A., Nicolau, A., Gong, Z. (2019). Using Hardware Counters to Predict Vectorization. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2017. Lecture Notes in Computer Science(), vol 11403. Springer, Cham. https://doi.org/10.1007/978-3-030-35225-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-35225-7_1
Published: 15 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35224-0
Online ISBN: 978-3-030-35225-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics