Geração Automática de Benchmarks para Compilação Preditiva
Pages 59 - 67
Abstract
O treinamento de um compilador preditivo requer uma quantidade muito vasta de benchmarks, que aproxime o universo de programas que o compilador irá encontrar durante sua utilização. Atualmente existem diversas técnicas para gerar benchmarks para calibrar compiladores preditivos. Contudo, essas técnicas, quando aplicadas a linguagens inseguras, como C, C++ ou dialetos de montagem, tendem a esbarrar em um desafio: a geração de códigos executáveis. A dificuldade em detectar comportamento indefinido, somada à dificuldade de criar entradas que executem vários caminhos de um mesmo programa, torna a geração de benchmarks executáveis um desafio. Este artigo descreve Jotai, um conjunto de aproximadamente 30 mil programas executáveis, minerados a partir de repositórios de código aberto. Jotai usa um inferidor de tipos publicamente disponível para garantir que códigos minerados automaticamente possam compilar, e usa uma linguagem de domínio específico para gerar entradas válidas para programas. Essa coleção pôde ser utilizada para prever o benefício advindo de otimizações de código; para encontrar boas configurações para as diferentes heurísticas usadas por compiladores; e para analisar correlações, por exemplo, entre o número de instruções processadas durante a execução de um programa e o tempo que esse programa leva para terminar.
Supplementary Material
Presentation video
- Download
- 76.56 MB
References
[1]
Andrei Rimsa Álvares, José Nelson Amaral, and Fernando Magno Quintão Pereira. 2021. Instruction visibility in SPEC CPU2017. J. Comput. Lang. 66(2021), 101062. https://doi.org/10.1016/j.cola.2021.101062
[2]
Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. 2016. Predictive Modeling Methodology for Compiler Phase-Ordering. In PARMA-DITAM. Association for Computing Machinery, New York, NY, USA, 7–12. https://doi.org/10.1145/2872421.2872424
[3]
Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A Survey on Compiler Autotuning Using Machine Learning. Comput. Surv. 51, 5 (2018), 96:1–96:42. https://doi.org/10.1145/3197978
[4]
Gergö Barany. 2017. Liveness-Driven Random Program Generation. In LOPSTR. Springer, Heidelberg, Germany, 112–127. https://doi.org/10.1007/978-3-319-94460-9_7
[5]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2329–2344. https://doi.org/10.1145/3133956.3134020
[6]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (San Diego, California) (OSDI’08). USENIX Association, USA, 209–224.
[7]
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler Fuzzing Through Deep Learning. In ISSTA. ACM, New York, NY, USA, 95–105. https://doi.org/10.1145/3213846.3213848
[8]
Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing Benchmarks for Predictive Modeling. In CGO. IEEE, Piscataway, NJ, USA, 86–99.
[9]
Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, and Hugh Leather. 2021. CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. CoRR abs/2109.08267(2021), 12 pages. arXiv:2109.08267
[10]
S. J. Cyvin. 1964. Algorithm 226: Normal Distribution Function. Commun. ACM 7, 5 (may 1964), 295. https://doi.org/10.1145/364099.364315
[11]
Comitê da Linguagem. 2004. Defect report #260. http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm JTC 1, SC 22, WG 14.
[12]
Anderson Faustino da Silva, Bruno Conde Kind, José Wesley de Souza Magalhães, Jerônimo Nunes Rocha, Breno Campos Ferreira Guimarães, and Fernando Magno Quintão Pereira. 2021. AnghaBench: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction. In CGO. IEEE, Los Alamitos, CA, USA, 378–390. https://doi.org/10.1109/CGO51591.2021.9370322
[13]
Rafael Dutra, Jonathan Bachrach, and Koushik Sen. 2018. SMTSampler: Efficient Stimulus Generation from Complex SMT Constraints. In ICCAD (San Diego, CA, USA). IEEE Press, 1–8. https://doi.org/10.1145/3240765.3240848
[14]
Patrice Godefroid. 2020. Fuzzing: Hack, Art, and Science. Commun. ACM 63, 2 (jan 2020), 70–76. https://doi.org/10.1145/3363824
[15]
Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, and Jeronimo Castrillon. 2019. A Case Study on Machine Learning for Synthesizing Benchmarks. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(Phoenix, AZ, USA) (MAPL 2019). ACM, New York, NY, USA, 38–46. https://doi.org/10.1145/3315508.3329976
[16]
Chris Hathhorn, Chucky Ellison, and Grigore Roşu. 2015. Defining the Undefinedness of C. In PLDI. Association for Computing Machinery, New York, NY, USA, 336–345. https://doi.org/10.1145/2737924.2737979
[17]
Hugh Leather, Edwin Bonilla, and Michael O’boyle. 2014. Automatic Feature Generation for Machine Learning–Based Optimising Compilation. ACM Trans. Archit. Code Optim. 11, 1, Article 14(2014), 32 pages. https://doi.org/10.1145/2536688
[18]
Hugh Leather and Chris Cummins. 2020. Machine Learning in Compilers: Past, Present and Future. In FDL. IEEE, Washington, DC, USA.
[19]
Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araújo, and Fernando Magno Quintão Pereira. 2018. Inference of Static Semantics for Incomplete C Programs. Proc. ACM Program. Lang. 2, POPL, Article 29 (Dec. 2018), 28 pages. https://doi.org/10.1145/3158117
[20]
Leandro T. C. Melo, Rodrigo G. Ribeiro, Breno C. F. Guimarães, and Fernando Magno Quintão Pereira. 2020. Type Inference for C: Applications to the Static Analysis of Incomplete Programs. ACM Trans. Program. Lang. Syst. 42, 3, Article 15(2020), 71 pages. https://doi.org/10.1145/3421472
[21]
Antoine Monsifrot, François Bodin, and Rene Quiniou. 2002. A Machine Learning Approach to Automatic Production of Compiler Heuristics. In AIMSA. Springer-Verlag, Berlin, Heidelberg, 41–50.
[22]
Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing Random Testing of Arithmetic Optimization of C Compilers by Scaling up Size and Number of Expressions. IPSJ Trans. System LSI Design Methodology 7 (2014), 91–100.
[23]
Kazuhiro Nakamura and Nagisa Ishiura. 2015. Introducing Loop Statements in Random Testing of C compilers Based on Expected Value Calculation. In Proc. the Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2015). 226–227.
[24]
Karl Pearson. 1895. Notes on Regression and Inheritance in the Case of Two Parents. Proceedings of the RSL 58, 1 (1895), 240–242.
[25]
Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, and Frederick Reiss. 2021. CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. arxiv:2105.12655 [cs.SE]
[26]
Talia Ringer, Dan Grossman, Daniel Schwartz-Narbonne, and Serdar Tasiran. 2017. A Solver-Aided Language for Test Input Generation. Proc. ACM Program. Lang. 1, OOPSLA, Article 91 (oct 2017), 24 pages. https://doi.org/10.1145/3133915
[27]
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In ATC (Boston, MA) (USENIX ATC’12). USENIX Association, USA, 28.
[28]
Samuel Sanford Shapiro and Martin B Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.
[29]
Anderson Faustino da Silva, Bernardo N. B. de Lima, and Fernando Magno Quintão Pereira. 2021. Exploring the Space of Optimization Sequences for Code-Size Reduction: Insights and Tools. In Compiler Construction. Association for Computing Machinery, New York, NY, USA, 47–58. https://doi.org/10.1145/3446804.3446849
[30]
Douglas Simon, John Cavazos, Christian Wimmer, and Sameer Kulkarni. 2013. Automatic Construction of Inlining Heuristics Using Machine Learning. In CGO. IEEE Computer Society, Washington, DC, USA, 1–12. https://doi.org/10.1109/CGO.2013.6495004
[31]
Zheng Wang and Michael F. P. O’Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE 106, 11 (2018), 1879–1901. https://doi.org/10.1109/JPROC.2018.2817118
[32]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI. ACM, New York, NY, USA, 283–294. https://doi.org/10.1145/1993498.1993532
[33]
Gang Zhao and Jeff Huang. 2018. DeepSim: Deep Learning Code Functional Similarity. In ESEC/FSE (Lake Buena Vista, FL, USA). ACM, New York, NY, USA, 141–151. https://doi.org/10.1145/3236024.3236068
Index Terms
- Geração Automática de Benchmarks para Compilação Preditiva
Comments
Information & Contributors
Information
Published In
![cover image ACM Other conferences](/cms/asset/0fd99880-6530-431e-83b7-1a7ccc6ec9f1/3561320.cover.jpg)
October 2022
75 pages
ISBN:9781450397445
DOI:10.1145/3561320
Copyright © 2022 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 06 October 2022
Check for updates
Author Tags
Qualifiers
- Research-article
- Research
- Refereed limited
Funding Sources
Conference
SBLP 2022
SBLP 2022: XXVI Brazilian Symposium on Programming Languages
October 6 - 7, 2022
Virtual Event, Brazil
Acceptance Rates
Overall Acceptance Rate 22 of 50 submissions, 44%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 29Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign inFull Access
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderHTML Format
View this article in HTML Format.
HTML Format