ABSTRACT
Jupyter notebooks are now widely adopted by data analysts as they provide a convenient environment for presenting computational results in a literate-programming document that combines code snippets, rich text, and inline visualizations. Literate-programming documents are intended to be computational narratives that are supplemented with self-explanatory text, but, recent studies have shown that this is lacking in practice. Efforts in the software engineering community to increase code comprehension in literate programming are limited. To address this, as a first step, this paper presents a prototype Jupyter notebook annotator, HeaderGen, that automatically creates a narrative structure in notebooks by classifying and annotating code cells based on the machine learning workflow. HeaderGen generates a markdown cell header for each code cell by statically analyzing the notebook, and in addition, associates these cell headers with a clickable table of contents for easier navigation. Further, we discuss our vision and opportunities based on this prototype.
- Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, Montreal, QC, Canada. 291–300. isbn:978-1-72811-760-7 https://doi.org/10.1109/ICSE-SEIP.2019.00042 Google ScholarDigital Library
- Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1110–1121. isbn:978-1-4503-7121-6 https://doi.org/10.1145/3377811.3380395 Google ScholarDigital Library
- Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The Story in the Notebook: Exploratory Data Science Using a Literate Programming Tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA. 1–11. isbn:978-1-4503-5620-6 https://doi.org/10.1145/3173574.3173748 Google ScholarDigital Library
- D. E. Knuth. 1984. Literate Programming. Comput. J., 27, 2 (1984), Jan., 97–111. issn:0010-4620 https://doi.org/10.1093/comjnl/27.2.97 Google ScholarCross Ref
- Stefan Krüger, Sarah Nadi, Michael Reif, Karim Ali, Mira Mezini, Eric Bodden, Florian Göpfert, Felix Günther, Christian Weinert, Daniel Demmler, and Ram Kamath. 2017. CogniCrypt: Supporting Developers in Using Cryptography. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Urbana-Champaign, IL, USA. 931–936. isbn:978-1-5386-2684-9Google ScholarDigital Library
- Sriteja Kummita, Goran Piskachev, Johannes Späth, and Eric Bodden. 2021. Qualitative and Quantitative Analysis of Callgraph Algorithms for Python. In 2021 International Conference on Code Quality (ICCQ). 1–15. https://doi.org/10.1109/ICCQ51190.2021.9392986 Google ScholarCross Ref
- Jeffrey M. Perkel. 2018. Why Jupyter Is Data Scientists’ Computational Notebook of Choice. Nature, 563, 7729 (2018), Oct., 145–146. https://doi.org/10.1038/d41586-018-07196-1 Google ScholarCross Ref
- Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, Montreal, QC, Canada. 507–517. isbn:978-1-72813-412-3 https://doi.org/10.1109/MSR.2019.00077 Google ScholarDigital Library
- Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:978-1-4503-5620-6 https://doi.org/10.1145/3173574.3173606 Google ScholarDigital Library
- Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, and Dimitris Mitropoulos. 2021. PyCG: Practical Call Graph Generation in Python. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, Madrid, Spain. 1646–1657. isbn:978-1-66540-296-5 https://doi.org/10.1109/ICSE43902.2021.00146 Google ScholarDigital Library
- Jiawei Wang, Tzu-yang Kuo, Li Li, and Andreas Zeller. 2020. Assessing and Restoring Reproducibility of Jupyter Notebooks. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ACM, Virtual Event Australia. 138–149. isbn:978-1-4503-6768-4 https://doi.org/10.1145/3324884.3416585 Google ScholarDigital Library
- Jiawei Wang, Li Li, and Andreas Zeller. 2020. Better Code, Better Sharing: On the Need of Analyzing Jupyter Notebooks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER ’20). Association for Computing Machinery, New York, NY, USA. 53–56. isbn:978-1-4503-7126-1 https://doi.org/10.1145/3377816.3381724 Google ScholarDigital Library
Index Terms
- Automated cell header generator for Jupyter notebooks
Recommendations
Assessing and restoring reproducibility of Jupyter notebooks
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software EngineeringJupyter notebooks---documents that contain live code, equations, visualizations, and narrative text---now are among the most popular means to compute, present, discuss and disseminate scientific findings. In principle, Jupyter notebooks should easily ...
Benefits and Pitfalls of Jupyter Notebooks in the Classroom
SIGITE '20: Proceedings of the 21st Annual Conference on Information Technology EducationJupyter notebooks are widely used in industry and in academic research, but have only begun to make inroads into the classroom. The design of the Jupyter notebook is in many ways well suited for teaching subjects in information technology and computer ...
Restoring reproducibility of Jupyter notebooks
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion ProceedingsJupyter notebooks---documents that contain live code, equations, visualizations, and narrative text---now are among the most popular means to compute, present, discuss and disseminate scientific findings. In principle, Jupyter notebooks should easily ...
Comments