skip to main content
research-article

Multi-objective Exploration for Practical Optimization Decisions in Binary Translation

Published:07 October 2019Publication History
Skip Abstract Section

Abstract

In the design of mobile systems, hardware/software (HW/SW) co-design has important advantages by creating specialized hardware for the performance or power optimizations. Dynamic binary translation (DBT) is a key component in co-design. During the translation, a dynamic optimizer in the DBT system applies various software optimizations to improve the quality of the translated code. With dynamic optimization, optimization time is an exposed run-time overhead and useful analyses are often restricted due to their high costs. Thus, a dynamic optimizer needs to make smart decisions with limited analysis information, which complicates the design of optimization decision models and often causes failures in human-made heuristics. In mobile systems, this problem is even more challenging because of strict constraints on computing capabilities and memory size.

To overcome the challenge, we investigate an opportunity to build practical optimization decision models for DBT by using machine learning techniques. As the first step, loop unrolling is chosen as the representative optimization. We base our approach on the industrial strength DBT infrastructure and conduct evaluation with 17,116 unrollable loops collected from 200 benchmarks and real-life programs across various domains. By utilizing all available features that are potentially important for loop unrolling decision, we identify the best classification algorithm for our infrastructure with consideration for both prediction accuracy and cost. The greedy feature selection algorithm is then applied to the classification algorithm to distinguish its significant features and cut down the feature space. By maintaining significant features only, the best affordable classifier, which satisfies the budgets allocated to the decision process, shows 74.5% of prediction accuracy for the optimal unroll factor and realizes an average 20.9% reduction in dynamic instruction count during the steady-state translated code execution. For comparison, the best baseline heuristic achieves 46.0% prediction accuracy with an average 13.6% instruction count reduction. Given that the infrastructure is already highly optimized and the ideal upper bound for instruction reduction is observed at 23.8%, we believe this result is noteworthy.

References

  1. 2019-02-08. Intel Core i7 Embedded Processor. https://ark.intel.com/products/series/122593/8th-Generation-Intel-Core-i7-Processors#@embedded.Google ScholarGoogle Scholar
  2. 2019-06-02. 3DMark. https://www.3dmark.com/.Google ScholarGoogle Scholar
  3. 2019-06-02. FPMark. https://www.eembc.org/fpmark/.Google ScholarGoogle Scholar
  4. 2019-06-02. Geekbench. https://www.geekbench.com/.Google ScholarGoogle Scholar
  5. 2019-06-02. SYSmark. https://bapco.com/products/sysmark-2018/.Google ScholarGoogle Scholar
  6. 2019-06-02. TabletMark. https://bapco.com/products/end-of-life-products/tabletmark/.Google ScholarGoogle Scholar
  7. Felice Balarin, Paolo Giusto, Attila Jurecska, Michael Chiodo, Harry Hsieh, Claudio Passerone, Ellen Sentovich, Luciano Lavagno, Bassam Tabbara, Alberto Sangiovanni-Vincentelli, et al. 1997. Hardware-software Co-design of Embedded Systems: The POLIS Approach. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  8. Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. Cython: The best of both worlds. Computing in Science 8 Engineering 13, 2 (2011), 31--39.Google ScholarGoogle Scholar
  9. Edson Borin, Youfeng Wu, Cheng Wang, Wei Liu, Mauricio Breternitz Jr, Shiliang Hu, Esfir Natanzon, Shai Rotem, and Roni Rosner. 2010. TAO: Two-level atomicity for dynamic binary optimizations. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 12--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. James Bucek, Klaus-Dieter Lange, et al. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. ACM, 41--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. John Cavazos and J. Eliot B. Moss. 2004. Inducing heuristics to decide whether to schedule. In ACM SIGPLAN Notices, Vol. 39. ACM, 183--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. John Cavazos and Michael F. P. O’Boyle. 2005. Automatic tuning of inlining heuristics. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference. IEEE, 14--14.Google ScholarGoogle Scholar
  13. John Cavazos and Michael F. P. O’boyle. 2006. Method-specific dynamic compilation using logistic regression. ACM SIGPLAN Notices 41, 10 (2006), 229--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jack W. Davidson and Sanjay Jinturkar. 1996. Aggressive loop unrolling in a retargetable, optimizing compiler. In International Conference on Compiler Construction. Springer, 59--73.Google ScholarGoogle Scholar
  15. James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The transmeta code morphing/spl trade/Software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Code Generation and Optimization, 2003. CGO 2003. International Symposium on. IEEE, 15--24.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kemal Ebcioglu, Erik Altman, Michael Gschwind, and Sumedh Sathaye. 2001. Dynamic binary translation and optimization. IEEE Transactions on Computers 50, 6 (2001), 529--548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. Minimizing register requirements under resource-constrained rate-optimal software pipelining. In Proceedings of the 27th Annual International Symposium on Microarchitecture. ACM, 85--94.Google ScholarGoogle Scholar
  18. John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kenneth Hoste, Andy Georges, and Lieven Eeckhout. 2010. Automated just-in-time compiler tuning. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 62--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chandra Krintz and Brad Calder. 2001. Using annotations to reduce dynamic optimization time. ACM Sigplan Notices 36, 5 (2001), 156--167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 81--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18--22.Google ScholarGoogle Scholar
  24. Ankur Limaye and Tosiron Adegbija. 2018. A workload characterization of the SPEC CPU2017 benchmark suite. In Performance Analysis of Systems and Software (ISPASS), 2018 IEEE International Symposium on. IEEE, 149--158.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, and Ce Zhang. 2018. MLbench: Benchmarking machine learning services against human experts. Proceedings of the VLDB Endowment 11, 10 (2018), 1220--1232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Josep Llosa, Mateo Valero, E. Agyuade, and Antonio González. 1998. Modulo scheduling with reduced register pressure. IEEE Transactions on Computers6 (1998), 625--638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Uma Mahadevan and Lacky Shah. 1998. Intelligent loop unrolling. US Patent 5,797,013.Google ScholarGoogle Scholar
  28. Antoine Monsifrot, François Bodin, and Rene Quiniou. 2002. A machine learning approach to automatic production of compiler heuristics. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, 41--50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for accurate and efficient simulation. In ACM SIGMETRICS Performance Evaluation Review, Vol. 31. ACM, 318--319.Google ScholarGoogle Scholar
  31. Archana Ravindar and Y. N. Srikant. 2011. Relative roles of instruction count and cycles per instruction in WCET estimation. In ACM SIGSOFT Software Engineering Notes, Vol. 36. ACM, 55--60.Google ScholarGoogle Scholar
  32. Stuart J Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Malaysia; Pearson Education Limited.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vivek Sarkar. 2000. Optimized unrolling of nested loops. In Proceedings of the 14th International Conference on Supercomputing. ACM, 153--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mark Stephenson and Saman Amarasinghe. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 123--134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O’Reilly. 2003. Meta optimization: Improving compiler heuristics with machine learning. In ACM SIGPLAN Notices, Vol. 38. ACM, 77--90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Cheng Wang and Youfeng Wu. 2013. TSO_ATOMICITY: Efficient hardware primitive for TSO-preserving region optimizations. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 509--520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zheng Wang and Michael O’Boyle. 2018. Machine learning in compiler optimization. Proc. IEEE (2018).Google ScholarGoogle Scholar
  38. Markus Willems, Volker Bursgens, Thorsten Grotker, and Heinrich Meyr. 1997. FRIDGE: An interactive code generation environment for HW/SW codesign. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE, 287--290.Google ScholarGoogle ScholarCross RefCross Ref
  39. Wayne H. Wolf. 1994. Hardware-software co-design of embedded systems. Proc. IEEE 82, 7 (1994), 967--989.Google ScholarGoogle ScholarCross RefCross Ref
  40. Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng Wu. 2003. A comparison of empirical and model-driven optimization. ACM SIGPLAN Notices 38, 5 (2003), 63--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xinchuan Zeng and Tony R. Martinez. 2000. Distribution-balanced stratified cross-validation for accuracy estimation. Journal of Experimental 8 Theoretical Artificial Intelligence 12, 1 (2000), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  42. Guoqiang Peter Zhang. 2000. Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30, 4 (2000), 451--462.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-objective Exploration for Practical Optimization Decisions in Binary Translation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 18, Issue 5s
          Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
          October 2019
          1423 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3365919
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 October 2019
          • Accepted: 1 July 2019
          • Revised: 1 June 2019
          • Received: 1 April 2019
          Published in tecs Volume 18, Issue 5s

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format