skip to main content
10.1145/3561320.3561323acmotherconferencesArticle/Chapter ViewAbstractPublication PagessblpConference Proceedingsconference-collections
research-article

Geração Automática de Benchmarks para Compilação Preditiva

Published: 06 October 2022 Publication History

Abstract

O treinamento de um compilador preditivo requer uma quantidade muito vasta de benchmarks, que aproxime o universo de programas que o compilador irá encontrar durante sua utilização. Atualmente existem diversas técnicas para gerar benchmarks para calibrar compiladores preditivos. Contudo, essas técnicas, quando aplicadas a linguagens inseguras, como C, C++ ou dialetos de montagem, tendem a esbarrar em um desafio: a geração de códigos executáveis. A dificuldade em detectar comportamento indefinido, somada à dificuldade de criar entradas que executem vários caminhos de um mesmo programa, torna a geração de benchmarks executáveis um desafio. Este artigo descreve Jotai, um conjunto de aproximadamente 30 mil programas executáveis, minerados a partir de repositórios de código aberto. Jotai usa um inferidor de tipos publicamente disponível para garantir que códigos minerados automaticamente possam compilar, e usa uma linguagem de domínio específico para gerar entradas válidas para programas. Essa coleção pôde ser utilizada para prever o benefício advindo de otimizações de código; para encontrar boas configurações para as diferentes heurísticas usadas por compiladores; e para analisar correlações, por exemplo, entre o número de instruções processadas durante a execução de um programa e o tempo que esse programa leva para terminar.

Supplementary Material

MP4 File (jotai_presentation.mp4)
Presentation video

References

[1]
Andrei Rimsa Álvares, José Nelson Amaral, and Fernando Magno Quintão Pereira. 2021. Instruction visibility in SPEC CPU2017. J. Comput. Lang. 66(2021), 101062. https://doi.org/10.1016/j.cola.2021.101062
[2]
Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. 2016. Predictive Modeling Methodology for Compiler Phase-Ordering. In PARMA-DITAM. Association for Computing Machinery, New York, NY, USA, 7–12. https://doi.org/10.1145/2872421.2872424
[3]
Amir H. Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. 2018. A Survey on Compiler Autotuning Using Machine Learning. Comput. Surv. 51, 5 (2018), 96:1–96:42. https://doi.org/10.1145/3197978
[4]
Gergö Barany. 2017. Liveness-Driven Random Program Generation. In LOPSTR. Springer, Heidelberg, Germany, 112–127. https://doi.org/10.1007/978-3-319-94460-9_7
[5]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Dallas, Texas, USA) (CCS ’17). Association for Computing Machinery, New York, NY, USA, 2329–2344. https://doi.org/10.1145/3133956.3134020
[6]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (San Diego, California) (OSDI’08). USENIX Association, USA, 209–224.
[7]
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler Fuzzing Through Deep Learning. In ISSTA. ACM, New York, NY, USA, 95–105. https://doi.org/10.1145/3213846.3213848
[8]
Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing Benchmarks for Predictive Modeling. In CGO. IEEE, Piscataway, NJ, USA, 86–99.
[9]
Chris Cummins, Bram Wasti, Jiadong Guo, Brandon Cui, Jason Ansel, Sahir Gomez, Somya Jain, Jia Liu, Olivier Teytaud, Benoit Steiner, Yuandong Tian, and Hugh Leather. 2021. CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research. CoRR abs/2109.08267(2021), 12 pages. arXiv:2109.08267
[10]
S. J. Cyvin. 1964. Algorithm 226: Normal Distribution Function. Commun. ACM 7, 5 (may 1964), 295. https://doi.org/10.1145/364099.364315
[11]
Comitê da Linguagem. 2004. Defect report #260. http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm JTC 1, SC 22, WG 14.
[12]
Anderson Faustino da Silva, Bruno Conde Kind, José Wesley de Souza Magalhães, Jerônimo Nunes Rocha, Breno Campos Ferreira Guimarães, and Fernando Magno Quintão Pereira. 2021. AnghaBench: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction. In CGO. IEEE, Los Alamitos, CA, USA, 378–390. https://doi.org/10.1109/CGO51591.2021.9370322
[13]
Rafael Dutra, Jonathan Bachrach, and Koushik Sen. 2018. SMTSampler: Efficient Stimulus Generation from Complex SMT Constraints. In ICCAD (San Diego, CA, USA). IEEE Press, 1–8. https://doi.org/10.1145/3240765.3240848
[14]
Patrice Godefroid. 2020. Fuzzing: Hack, Art, and Science. Commun. ACM 63, 2 (jan 2020), 70–76. https://doi.org/10.1145/3363824
[15]
Andrés Goens, Alexander Brauckmann, Sebastian Ertel, Chris Cummins, Hugh Leather, and Jeronimo Castrillon. 2019. A Case Study on Machine Learning for Synthesizing Benchmarks. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages(Phoenix, AZ, USA) (MAPL 2019). ACM, New York, NY, USA, 38–46. https://doi.org/10.1145/3315508.3329976
[16]
Chris Hathhorn, Chucky Ellison, and Grigore Roşu. 2015. Defining the Undefinedness of C. In PLDI. Association for Computing Machinery, New York, NY, USA, 336–345. https://doi.org/10.1145/2737924.2737979
[17]
Hugh Leather, Edwin Bonilla, and Michael O’boyle. 2014. Automatic Feature Generation for Machine Learning–Based Optimising Compilation. ACM Trans. Archit. Code Optim. 11, 1, Article 14(2014), 32 pages. https://doi.org/10.1145/2536688
[18]
Hugh Leather and Chris Cummins. 2020. Machine Learning in Compilers: Past, Present and Future. In FDL. IEEE, Washington, DC, USA.
[19]
Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araújo, and Fernando Magno Quintão Pereira. 2018. Inference of Static Semantics for Incomplete C Programs. Proc. ACM Program. Lang. 2, POPL, Article 29 (Dec. 2018), 28 pages. https://doi.org/10.1145/3158117
[20]
Leandro T. C. Melo, Rodrigo G. Ribeiro, Breno C. F. Guimarães, and Fernando Magno Quintão Pereira. 2020. Type Inference for C: Applications to the Static Analysis of Incomplete Programs. ACM Trans. Program. Lang. Syst. 42, 3, Article 15(2020), 71 pages. https://doi.org/10.1145/3421472
[21]
Antoine Monsifrot, François Bodin, and Rene Quiniou. 2002. A Machine Learning Approach to Automatic Production of Compiler Heuristics. In AIMSA. Springer-Verlag, Berlin, Heidelberg, 41–50.
[22]
Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing Random Testing of Arithmetic Optimization of C Compilers by Scaling up Size and Number of Expressions. IPSJ Trans. System LSI Design Methodology 7 (2014), 91–100.
[23]
Kazuhiro Nakamura and Nagisa Ishiura. 2015. Introducing Loop Statements in Random Testing of C compilers Based on Expected Value Calculation. In Proc. the Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2015). 226–227.
[24]
Karl Pearson. 1895. Notes on Regression and Inheritance in the Case of Two Parents. Proceedings of the RSL 58, 1 (1895), 240–242.
[25]
Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, and Frederick Reiss. 2021. CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. arxiv:2105.12655 [cs.SE]
[26]
Talia Ringer, Dan Grossman, Daniel Schwartz-Narbonne, and Serdar Tasiran. 2017. A Solver-Aided Language for Test Input Generation. Proc. ACM Program. Lang. 1, OOPSLA, Article 91 (oct 2017), 24 pages. https://doi.org/10.1145/3133915
[27]
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In ATC (Boston, MA) (USENIX ATC’12). USENIX Association, USA, 28.
[28]
Samuel Sanford Shapiro and Martin B Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.
[29]
Anderson Faustino da Silva, Bernardo N. B. de Lima, and Fernando Magno Quintão Pereira. 2021. Exploring the Space of Optimization Sequences for Code-Size Reduction: Insights and Tools. In Compiler Construction. Association for Computing Machinery, New York, NY, USA, 47–58. https://doi.org/10.1145/3446804.3446849
[30]
Douglas Simon, John Cavazos, Christian Wimmer, and Sameer Kulkarni. 2013. Automatic Construction of Inlining Heuristics Using Machine Learning. In CGO. IEEE Computer Society, Washington, DC, USA, 1–12. https://doi.org/10.1109/CGO.2013.6495004
[31]
Zheng Wang and Michael F. P. O’Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE 106, 11 (2018), 1879–1901. https://doi.org/10.1109/JPROC.2018.2817118
[32]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI. ACM, New York, NY, USA, 283–294. https://doi.org/10.1145/1993498.1993532
[33]
Gang Zhao and Jeff Huang. 2018. DeepSim: Deep Learning Code Functional Similarity. In ESEC/FSE (Lake Buena Vista, FL, USA). ACM, New York, NY, USA, 141–151. https://doi.org/10.1145/3236024.3236068

Index Terms

  1. Geração Automática de Benchmarks para Compilação Preditiva

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SBLP '22: Proceedings of the XXVI Brazilian Symposium on Programming Languages
      October 2022
      75 pages
      ISBN:9781450397445
      DOI:10.1145/3561320
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Autotuning
      2. Benchmarks
      3. Compilação Preditiva

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      SBLP 2022
      SBLP 2022: XXVI Brazilian Symposium on Programming Languages
      October 6 - 7, 2022
      Virtual Event, Brazil

      Acceptance Rates

      Overall Acceptance Rate 22 of 50 submissions, 44%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 29
        Total Downloads
      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media