research-article

Geração Automática de Benchmarks para Compilação Preditiva

Authors:

Fernando PereiraAuthors Info & Claims

SBLP '22: Proceedings of the XXVI Brazilian Symposium on Programming Languages

Pages 59 - 67

https://doi.org/10.1145/3561320.3561323

Published: 06 October 2022 Publication History

Abstract

O treinamento de um compilador preditivo requer uma quantidade muito vasta de benchmarks, que aproxime o universo de programas que o compilador irá encontrar durante sua utilização. Atualmente existem diversas técnicas para gerar benchmarks para calibrar compiladores preditivos. Contudo, essas técnicas, quando aplicadas a linguagens inseguras, como C, C++ ou dialetos de montagem, tendem a esbarrar em um desafio: a geração de códigos executáveis. A dificuldade em detectar comportamento indefinido, somada à dificuldade de criar entradas que executem vários caminhos de um mesmo programa, torna a geração de benchmarks executáveis um desafio. Este artigo descreve Jotai, um conjunto de aproximadamente 30 mil programas executáveis, minerados a partir de repositórios de código aberto. Jotai usa um inferidor de tipos publicamente disponível para garantir que códigos minerados automaticamente possam compilar, e usa uma linguagem de domínio específico para gerar entradas válidas para programas. Essa coleção pôde ser utilizada para prever o benefício advindo de otimizações de código; para encontrar boas configurações para as diferentes heurísticas usadas por compiladores; e para analisar correlações, por exemplo, entre o número de instruções processadas durante a execução de um programa e o tempo que esse programa leva para terminar.

Supplementary Material

MP4 File (jotai_presentation.mp4)

Presentation video

Download
76.56 MB

References

[1]

Andrei Rimsa Álvares, José Nelson Amaral, and Fernando Magno Quintão Pereira. 2021. Instruction visibility in SPEC CPU2017. J. Comput. Lang. 66(2021), 101062. https://doi.org/10.1016/j.cola.2021.101062

[2]

Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. 2016. Predictive Modeling Methodology for Compiler Phase-Ordering. In PARMA-DITAM. Association for Computing Machinery, New York, NY, USA, 7–12. https://doi.org/10.1145/2872421.2872424

Digital Library

[3]

Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A Survey on Compiler Autotuning Using Machine Learning. Comput. Surv. 51, 5 (2018), 96:1–96:42. https://doi.org/10.1145/3197978

Digital Library

[4]

Gergö Barany. 2017. Liveness-Driven Random Program Generation. In LOPSTR. Springer, Heidelberg, Germany, 112–127. https://doi.org/10.1007/978-3-319-94460-9_7

[5]

Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2329–2344. https://doi.org/10.1145/3133956.3134020

Digital Library

[6]

Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (San Diego, California) (OSDI’08). USENIX Association, USA, 209–224.

Digital Library

[7]

Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler Fuzzing Through Deep Learning. In ISSTA. ACM, New York, NY, USA, 95–105. https://doi.org/10.1145/3213846.3213848

Digital Library

[8]

Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing Benchmarks for Predictive Modeling. In CGO. IEEE, Piscataway, NJ, USA, 86–99.

[9]

Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, and Hugh Leather. 2021. CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. CoRR abs/2109.08267(2021), 12 pages. arXiv:2109.08267

[10]

S. J. Cyvin. 1964. Algorithm 226: Normal Distribution Function. Commun. ACM 7, 5 (may 1964), 295. https://doi.org/10.1145/364099.364315

Digital Library

[11]

Comitê da Linguagem. 2004. Defect report #260. http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm JTC 1, SC 22, WG 14.

[12]

Anderson Faustino da Silva, Bruno Conde Kind, José Wesley de Souza Magalhães, Jerônimo Nunes Rocha, Breno Campos Ferreira Guimarães, and Fernando Magno Quintão Pereira. 2021. AnghaBench: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction. In CGO. IEEE, Los Alamitos, CA, USA, 378–390. https://doi.org/10.1109/CGO51591.2021.9370322

Digital Library

[13]

Rafael Dutra, Jonathan Bachrach, and Koushik Sen. 2018. SMTSampler: Efficient Stimulus Generation from Complex SMT Constraints. In ICCAD (San Diego, CA, USA). IEEE Press, 1–8. https://doi.org/10.1145/3240765.3240848

Digital Library

[14]

Patrice Godefroid. 2020. Fuzzing: Hack, Art, and Science. Commun. ACM 63, 2 (jan 2020), 70–76. https://doi.org/10.1145/3363824

Digital Library

[15]

Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, and Jeronimo Castrillon. 2019. A Case Study on Machine Learning for Synthesizing Benchmarks. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(Phoenix, AZ, USA) (MAPL 2019). ACM, New York, NY, USA, 38–46. https://doi.org/10.1145/3315508.3329976

Digital Library

[16]

Chris Hathhorn, Chucky Ellison, and Grigore Roşu. 2015. Defining the Undefinedness of C. In PLDI. Association for Computing Machinery, New York, NY, USA, 336–345. https://doi.org/10.1145/2737924.2737979

Digital Library

[17]

Hugh Leather, Edwin Bonilla, and Michael O’boyle. 2014. Automatic Feature Generation for Machine Learning–Based Optimising Compilation. ACM Trans. Archit. Code Optim. 11, 1, Article 14(2014), 32 pages. https://doi.org/10.1145/2536688

Digital Library

[18]

Hugh Leather and Chris Cummins. 2020. Machine Learning in Compilers: Past, Present and Future. In FDL. IEEE, Washington, DC, USA.

[19]

Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araújo, and Fernando Magno Quintão Pereira. 2018. Inference of Static Semantics for Incomplete C Programs. Proc. ACM Program. Lang. 2, POPL, Article 29 (Dec. 2018), 28 pages. https://doi.org/10.1145/3158117

Digital Library

[20]

Leandro T. C. Melo, Rodrigo G. Ribeiro, Breno C. F. Guimarães, and Fernando Magno Quintão Pereira. 2020. Type Inference for C: Applications to the Static Analysis of Incomplete Programs. ACM Trans. Program. Lang. Syst. 42, 3, Article 15(2020), 71 pages. https://doi.org/10.1145/3421472

Digital Library

[21]

Antoine Monsifrot, François Bodin, and Rene Quiniou. 2002. A Machine Learning Approach to Automatic Production of Compiler Heuristics. In AIMSA. Springer-Verlag, Berlin, Heidelberg, 41–50.

[22]

Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing Random Testing of Arithmetic Optimization of C Compilers by Scaling up Size and Number of Expressions. IPSJ Trans. System LSI Design Methodology 7 (2014), 91–100.

[23]

Kazuhiro Nakamura and Nagisa Ishiura. 2015. Introducing Loop Statements in Random Testing of C compilers Based on Expected Value Calculation. In Proc. the Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2015). 226–227.

[24]

Karl Pearson. 1895. Notes on Regression and Inheritance in the Case of Two Parents. Proceedings of the RSL 58, 1 (1895), 240–242.

[25]

Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, and Frederick Reiss. 2021. CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. arxiv:2105.12655 [cs.SE]

[26]

Talia Ringer, Dan Grossman, Daniel Schwartz-Narbonne, and Serdar Tasiran. 2017. A Solver-Aided Language for Test Input Generation. Proc. ACM Program. Lang. 1, OOPSLA, Article 91 (oct 2017), 24 pages. https://doi.org/10.1145/3133915

Digital Library

[27]

Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In ATC (Boston, MA) (USENIX ATC’12). USENIX Association, USA, 28.

[28]

Samuel Sanford Shapiro and Martin B Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.

[29]

Anderson Faustino da Silva, Bernardo N. B. de Lima, and Fernando Magno Quintão Pereira. 2021. Exploring the Space of Optimization Sequences for Code-Size Reduction: Insights and Tools. In Compiler Construction. Association for Computing Machinery, New York, NY, USA, 47–58. https://doi.org/10.1145/3446804.3446849

Digital Library

[30]

Douglas Simon, John Cavazos, Christian Wimmer, and Sameer Kulkarni. 2013. Automatic Construction of Inlining Heuristics Using Machine Learning. In CGO. IEEE Computer Society, Washington, DC, USA, 1–12. https://doi.org/10.1109/CGO.2013.6495004

Digital Library

[31]

Zheng Wang and Michael F. P. O’Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE 106, 11 (2018), 1879–1901. https://doi.org/10.1109/JPROC.2018.2817118

[32]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI. ACM, New York, NY, USA, 283–294. https://doi.org/10.1145/1993498.1993532

Digital Library

[33]

Gang Zhao and Jeff Huang. 2018. DeepSim: Deep Learning Code Functional Similarity. In ESEC/FSE (Lake Buena Vista, FL, USA). ACM, New York, NY, USA, 141–151. https://doi.org/10.1145/3236024.3236068

Digital Library

Index Terms

Geração Automática de Benchmarks para Compilação Preditiva
1. Security and privacy
  1. Cryptography
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBLP '22: Proceedings of the XXVI Brazilian Symposium on Programming Languages

October 2022

75 pages

ISBN:9781450397445

DOI:10.1145/3561320

Editors:
Marcelo Maia
Universidade Federal de Uberlândia, Brazil
,
Fábio Dorça
Universidade Federal de Uberlândia, Brazil
,
Rafael Araújo
Universidade Federal de Uberlândia, Brazil
,
Cristiano Damiani Vasconcellos
Universidade do Estado de Santa Catarina, Brazil

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

SBLP 2022

SBLP 2022: XXVI Brazilian Symposium on Programming Languages

October 6 - 7, 2022

Virtual Event, Brazil

Acceptance Rates

Overall Acceptance Rate 22 of 50 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
29
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten