research-article

Multi-objective Exploration for Practical Optimization Decisions in Binary Translation

Authors:
Sunghyun Park

University of Michigan, Ann Arbor, Michigan

University of Michigan, Ann Arbor, Michigan
View Profile

,
Youfeng Wu

Intel Corporation, Santa Clara, CA

Intel Corporation, Santa Clara, CA
View Profile

,
Janghaeng Lee

Intel Corporation, Santa Clara, CA

Intel Corporation, Santa Clara, CA
View Profile

,
Amir Aupov

Intel Corporation, Santa Clara, CA

Intel Corporation, Santa Clara, CA
View Profile

,
Scott Mahlke

University of Michigan, Ann Arbor, Michigan

University of Michigan, Ann Arbor, Michigan
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 18 Issue 5sArticle No.: 57pp 1–19https://doi.org/10.1145/3358185

Published:07 October 2019Publication History

ACM Transactions on Embedded Computing Systems

Abstract

In the design of mobile systems, hardware/software (HW/SW) co-design has important advantages by creating specialized hardware for the performance or power optimizations. Dynamic binary translation (DBT) is a key component in co-design. During the translation, a dynamic optimizer in the DBT system applies various software optimizations to improve the quality of the translated code. With dynamic optimization, optimization time is an exposed run-time overhead and useful analyses are often restricted due to their high costs. Thus, a dynamic optimizer needs to make smart decisions with limited analysis information, which complicates the design of optimization decision models and often causes failures in human-made heuristics. In mobile systems, this problem is even more challenging because of strict constraints on computing capabilities and memory size.

To overcome the challenge, we investigate an opportunity to build practical optimization decision models for DBT by using machine learning techniques. As the first step, loop unrolling is chosen as the representative optimization. We base our approach on the industrial strength DBT infrastructure and conduct evaluation with 17,116 unrollable loops collected from 200 benchmarks and real-life programs across various domains. By utilizing all available features that are potentially important for loop unrolling decision, we identify the best classification algorithm for our infrastructure with consideration for both prediction accuracy and cost. The greedy feature selection algorithm is then applied to the classification algorithm to distinguish its significant features and cut down the feature space. By maintaining significant features only, the best affordable classifier, which satisfies the budgets allocated to the decision process, shows 74.5% of prediction accuracy for the optimal unroll factor and realizes an average 20.9% reduction in dynamic instruction count during the steady-state translated code execution. For comparison, the best baseline heuristic achieves 46.0% prediction accuracy with an average 13.6% instruction count reduction. Given that the infrastructure is already highly optimized and the ideal upper bound for instruction reduction is observed at 23.8%, we believe this result is noteworthy.

References

2019-02-08. Intel Core i7 Embedded Processor. https://ark.intel.com/products/series/122593/8th-Generation-Intel-Core-i7-Processors#@embedded.Google Scholar
2019-06-02. 3DMark. https://www.3dmark.com/.Google Scholar
2019-06-02. FPMark. https://www.eembc.org/fpmark/.Google Scholar
2019-06-02. Geekbench. https://www.geekbench.com/.Google Scholar
2019-06-02. SYSmark. https://bapco.com/products/sysmark-2018/.Google Scholar
2019-06-02. TabletMark. https://bapco.com/products/end-of-life-products/tabletmark/.Google Scholar
Felice Balarin, Paolo Giusto, Attila Jurecska, Michael Chiodo, Harry Hsieh, Claudio Passerone, Ellen Sentovich, Luciano Lavagno, Bassam Tabbara, Alberto Sangiovanni-Vincentelli, et al. 1997. Hardware-software Co-design of Embedded Systems: The POLIS Approach. Springer Science 8 Business Media.Google Scholar
Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. Cython: The best of both worlds. Computing in Science 8 Engineering 13, 2 (2011), 31--39.Google Scholar
Edson Borin, Youfeng Wu, Cheng Wang, Wei Liu, Mauricio Breternitz Jr, Shiliang Hu, Esfir Natanzon, Shai Rotem, and Roni Rosner. 2010. TAO: Two-level atomicity for dynamic binary optimizations. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 12--21.Google ScholarDigital Library
James Bucek, Klaus-Dieter Lange, et al. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. ACM, 41--42.Google ScholarDigital Library
John Cavazos and J. Eliot B. Moss. 2004. Inducing heuristics to decide whether to schedule. In ACM SIGPLAN Notices, Vol. 39. ACM, 183--194.Google ScholarDigital Library
John Cavazos and Michael F. P. O’Boyle. 2005. Automatic tuning of inlining heuristics. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference. IEEE, 14--14.Google Scholar
John Cavazos and Michael F. P. O’boyle. 2006. Method-specific dynamic compilation using logistic regression. ACM SIGPLAN Notices 41, 10 (2006), 229--240.Google ScholarDigital Library
Jack W. Davidson and Sanjay Jinturkar. 1996. Aggressive loop unrolling in a retargetable, optimizing compiler. In International Conference on Compiler Construction. Springer, 59--73.Google Scholar
James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The transmeta code morphing/spl trade/Software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Code Generation and Optimization, 2003. CGO 2003. International Symposium on. IEEE, 15--24.Google ScholarCross Ref
Kemal Ebcioglu, Erik Altman, Michael Gschwind, and Sumedh Sathaye. 2001. Dynamic binary translation and optimization. IEEE Transactions on Computers 50, 6 (2001), 529--548.Google ScholarDigital Library
Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. Minimizing register requirements under resource-constrained rate-optimal software pipelining. In Proceedings of the 27th Annual International Symposium on Microarchitecture. ACM, 85--94.Google Scholar
John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.Google ScholarDigital Library
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1--17.Google ScholarDigital Library
Kenneth Hoste, Andy Georges, and Lieven Eeckhout. 2010. Automated just-in-time compiler tuning. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 62--72.Google ScholarDigital Library
Chandra Krintz and Brad Calder. 2001. Using annotations to reduce dynamic optimization time. ACM Sigplan Notices 36, 5 (2001), 156--167.Google ScholarDigital Library
Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 81--91.Google ScholarDigital Library
Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18--22.Google Scholar
Ankur Limaye and Tosiron Adegbija. 2018. A workload characterization of the SPEC CPU2017 benchmark suite. In Performance Analysis of Systems and Software (ISPASS), 2018 IEEE International Symposium on. IEEE, 149--158.Google ScholarCross Ref
Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, and Ce Zhang. 2018. MLbench: Benchmarking machine learning services against human experts. Proceedings of the VLDB Endowment 11, 10 (2018), 1220--1232.Google ScholarDigital Library
Josep Llosa, Mateo Valero, E. Agyuade, and Antonio González. 1998. Modulo scheduling with reduced register pressure. IEEE Transactions on Computers6 (1998), 625--638.Google ScholarDigital Library
Uma Mahadevan and Lacky Shah. 1998. Intelligent loop unrolling. US Patent 5,797,013.Google Scholar
Antoine Monsifrot, François Bodin, and Rene Quiniou. 2002. A machine learning approach to automatic production of compiler heuristics. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, 41--50.Google ScholarDigital Library
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830.Google ScholarDigital Library
Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for accurate and efficient simulation. In ACM SIGMETRICS Performance Evaluation Review, Vol. 31. ACM, 318--319.Google Scholar
Archana Ravindar and Y. N. Srikant. 2011. Relative roles of instruction count and cycles per instruction in WCET estimation. In ACM SIGSOFT Software Engineering Notes, Vol. 36. ACM, 55--60.Google Scholar
Stuart J Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Malaysia; Pearson Education Limited.Google ScholarDigital Library
Vivek Sarkar. 2000. Optimized unrolling of nested loops. In Proceedings of the 14th International Conference on Supercomputing. ACM, 153--166.Google ScholarDigital Library
Mark Stephenson and Saman Amarasinghe. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 123--134.Google ScholarDigital Library
Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O’Reilly. 2003. Meta optimization: Improving compiler heuristics with machine learning. In ACM SIGPLAN Notices, Vol. 38. ACM, 77--90.Google ScholarDigital Library
Cheng Wang and Youfeng Wu. 2013. TSO_ATOMICITY: Efficient hardware primitive for TSO-preserving region optimizations. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 509--520.Google ScholarDigital Library
Zheng Wang and Michael O’Boyle. 2018. Machine learning in compiler optimization. Proc. IEEE (2018).Google Scholar
Markus Willems, Volker Bursgens, Thorsten Grotker, and Heinrich Meyr. 1997. FRIDGE: An interactive code generation environment for HW/SW codesign. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE, 287--290.Google ScholarCross Ref
Wayne H. Wolf. 1994. Hardware-software co-design of embedded systems. Proc. IEEE 82, 7 (1994), 967--989.Google ScholarCross Ref
Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng Wu. 2003. A comparison of empirical and model-driven optimization. ACM SIGPLAN Notices 38, 5 (2003), 63--76.Google ScholarDigital Library
Xinchuan Zeng and Tony R. Martinez. 2000. Distribution-balanced stratified cross-validation for accuracy estimation. Journal of Experimental 8 Theoretical Artificial Intelligence 12, 1 (2000), 1--12.Google ScholarCross Ref
Guoqiang Peter Zhang. 2000. Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30, 4 (2000), 451--462.Google ScholarDigital Library

Index Terms

Multi-objective Exploration for Practical Optimization Decisions in Binary Translation

Recommendations

The continuous artificial bee colony algorithm for binary optimization

This paper introduces an ABC variant to solve binary optimization problems.The performance of the proposed method is investigated on well-known UFLPs.The proposed method is compared with the ABC variants and PSO variants.The experimental results show ...
Read More
Movement Strategies for Multi-Objective Particle Swarm Optimization

Particle Swarm Optimization (PSO) is one of the most effective metaheuristics algorithms, with many successful real-world applications. The reason for the success of PSO is the movement behavior, which allows the swarm to effectively explore the search ...
Read More
Multi-objective optimization using BFO algorithm
ICIC'11: Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications

This paper describes a novel bacterial foraging optimization (BFO) approach to multi-objective optimization, called Multi-objective Bacterial Foraging Optimization (MBFO). The search for Pareto optimal set of multi-objective optimization problems is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 18, Issue 5s
Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
October 2019
1423 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3365919
Editor:
Sandeep K. Shukla
Indian Institute of Technology, India
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 7 October 2019
- Accepted: 1 July 2019
- Revised: 1 June 2019
- Received: 1 April 2019
Published in tecs Volume 18, Issue 5s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Loop unrolling<?clr?>
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 216
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Multi-objective Exploration for Practical Optimization Decisions in Binary Translation

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

The continuous artificial bee colony algorithm for binary optimization

Movement Strategies for Multi-Objective Particle Swarm Optimization

Multi-objective optimization using BFO algorithm