skip to main content
10.1145/3464968.3468410acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
short-paper

Automated cell header generator for Jupyter notebooks

Published:11 July 2021Publication History

ABSTRACT

Jupyter notebooks are now widely adopted by data analysts as they provide a convenient environment for presenting computational results in a literate-programming document that combines code snippets, rich text, and inline visualizations. Literate-programming documents are intended to be computational narratives that are supplemented with self-explanatory text, but, recent studies have shown that this is lacking in practice. Efforts in the software engineering community to increase code comprehension in literate programming are limited. To address this, as a first step, this paper presents a prototype Jupyter notebook annotator, HeaderGen, that automatically creates a narrative structure in notebooks by classifying and annotating code cells based on the machine learning workflow. HeaderGen generates a markdown cell header for each code cell by statically analyzing the notebook, and in addition, associates these cell headers with a clickable table of contents for easier navigation. Further, we discuss our vision and opportunities based on this prototype.

References

  1. Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, Montreal, QC, Canada. 291–300. isbn:978-1-72811-760-7 https://doi.org/10.1109/ICSE-SEIP.2019.00042 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1110–1121. isbn:978-1-4503-7121-6 https://doi.org/10.1145/3377811.3380395 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The Story in the Notebook: Exploratory Data Science Using a Literate Programming Tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA. 1–11. isbn:978-1-4503-5620-6 https://doi.org/10.1145/3173574.3173748 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. E. Knuth. 1984. Literate Programming. Comput. J., 27, 2 (1984), Jan., 97–111. issn:0010-4620 https://doi.org/10.1093/comjnl/27.2.97 Google ScholarGoogle ScholarCross RefCross Ref
  5. Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, and Ram Kamath. 2017. CogniCrypt: Supporting Developers in Using Cryptography. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Urbana-Champaign, IL, USA. 931–936. isbn:978-1-5386-2684-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sriteja Kummita, Goran Piskachev, Johannes Späth, and Eric Bodden. 2021. Qualitative and Quantitative Analysis of Callgraph Algorithms for Python. In 2021 International Conference on Code Quality (ICCQ). 1–15. https://doi.org/10.1109/ICCQ51190.2021.9392986 Google ScholarGoogle ScholarCross RefCross Ref
  7. Jeffrey M. Perkel. 2018. Why Jupyter Is Data Scientists’ Computational Notebook of Choice. Nature, 563, 7729 (2018), Oct., 145–146. https://doi.org/10.1038/d41586-018-07196-1 Google ScholarGoogle ScholarCross RefCross Ref
  8. Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, Montreal, QC, Canada. 507–517. isbn:978-1-72813-412-3 https://doi.org/10.1109/MSR.2019.00077 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:978-1-4503-5620-6 https://doi.org/10.1145/3173574.3173606 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, and Dimitris Mitropoulos. 2021. PyCG: Practical Call Graph Generation in Python. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, Madrid, Spain. 1646–1657. isbn:978-1-66540-296-5 https://doi.org/10.1109/ICSE43902.2021.00146 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jiawei Wang, Tzu-yang Kuo, Li Li, and Andreas Zeller. 2020. Assessing and Restoring Reproducibility of Jupyter Notebooks. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ACM, Virtual Event Australia. 138–149. isbn:978-1-4503-6768-4 https://doi.org/10.1145/3324884.3416585 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jiawei Wang, Li Li, and Andreas Zeller. 2020. Better Code, Better Sharing: On the Need of Analyzing Jupyter Notebooks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER ’20). Association for Computing Machinery, New York, NY, USA. 53–56. isbn:978-1-4503-7126-1 https://doi.org/10.1145/3377816.3381724 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automated cell header generator for Jupyter notebooks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      AISTA 2021: Proceedings of the 1st ACM International Workshop on AI and Software Testing/Analysis
      July 2021
      20 pages
      ISBN:9781450385411
      DOI:10.1145/3464968
      • General Chairs:
      • Shuai Wang,
      • Xiaofei Xie,
      • Lei Ma

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 July 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Upcoming Conference

      ISSTA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader