ABSTRACT
The massive trend of integrating data-driven AI capabilities into traditional software systems is rising new intriguing challenges. One of such challenges is achieving a smooth transition from the explorative phase of Machine Learning projects - in which data scientists build prototypical models in the lab - to their production phase - in which software engineers translate prototypes into production-ready AI components. To narrow down the gap between these two phases, tools and practices adopted by data scientists might be improved by incorporating consolidated software engineering solutions. In particular, computational notebooks have a prominent role in determining the quality of data science prototypes. In my research project, I address this challenge by studying the best practices for collaboration with computational notebooks and proposing proof-of-concept tools to foster guidelines compliance.
- Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: a Case Study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice. IEEE Press, 291--300.Google ScholarDigital Library
- Anders Arpteg, Björn Brinne, Luka Crnkovic-Friis, and Jan Bosch. 2018. Software engineering challenges of deep learning. In 2018 44th euromicro conference on software engineering and advanced applications (SEAA). 50--59. tex.organization: IEEE.Google Scholar
- Justus Bogner, Roberto Verdecchia, and Ilias Gerostathopoulos. 2021. Characterizing Technical Debt and Antipatterns in AI-Based Systems: A Systematic Mapping Study. In 2021 IEEE/ACM International Conference on Technical Debt (TechDebt). IEEE, Madrid, Spain. arXiv:2103.09783. Google ScholarCross Ref
- Vahid Garousi, Michael Felderer, and Mika V. Mäntylä. 2019. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Information and Software Technology 106 (2019), 101 -- 121. Google ScholarCross Ref
- Joel Grus. 2018. I don't like notebooks. https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/68282.htmlGoogle Scholar
- Andrew Head, Fred Hohman, Titus Barik, Steven M. Drucker, and Robert DeLine. 2019. Managing Messes in Computational Notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI '19. ACM Press, Glasgow, Scotland Uk, 1--12. Google ScholarDigital Library
- Mary Beth Kery and Brad A Myers. 2018. Interactions for untangling messy history in a computational notebook. In 2018 IEEE symposium on visual languages and human-centric computing (VL/HCC). 147--155. tex.organization: IEEE.Google ScholarCross Ref
- Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2018. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 44, 11 (2018), 1024--1038. Publisher: IEEE. Google ScholarCross Ref
- Andreas Koenzen, Neil Ernst, and Margaret-Anne Storey. 2020. Code Duplication and Reuse in Jupyter Notebooks. In Proc. of the 2020 Symposium on Visual Languages and Human-Centric Computing. Google ScholarCross Ref
- Filippo Lanubile, Fabio Calefato, Luigi Quaranta, Maddalena Amoruso, Fabio Fumarola, and Michele Filannino. 2021. Towards Productizing AI/ML Models: An Industry Perspective from Data Scientists. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain, 129--132. Google ScholarCross Ref
- Grace A. Lewis, Stephany Bellomo, and April Galyardt. 2019. Component Mismatches Are a Critical Bottleneck to Fielding AI-Enabled Systems in the Public Sector. In arXiv:1910.06136 [cs]. Arlington, Virginia, USA. http://arxiv.org/abs/1910.06136 arXiv: 1910.06136.Google Scholar
- Grace A. Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain. Google ScholarCross Ref
- Lucy Ellen Lwakatare, Aiswarya Raj, Jan Bosch, Helena Holmström Olsson, and Ivica Crnkovic. 2019. A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation. In Agile Processes in Software Engineering and Extreme Programming, Philippe Kruchten, Steven Fraser, and François Coallier (Eds.). Springer International Publishing, 227--243.Google Scholar
- Lucy Ellen Lwakatare, Aiswarya Raj, Ivica Crnkovic, Jan Bosch, and Helena Holmström Olsson. 2020. Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Information and Software Technology 127 (2020), 106368. Google ScholarCross Ref
- E Nascimento, I Ahmed, E Oliveira, MP Palheta, I Steinmacher, and T Conte. 2019. Understanding Development Process of Machine Learning Systems: Challenges and Solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--6. Google ScholarCross Ref
- Jeffrey M Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature 563, 7732 (2018), 145--147.Google Scholar
- Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In Proc. of the 16th International Conference on Mining Software Repositories. 507--517. Google ScholarDigital Library
- João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2021. Understanding and improving the quality and reproducibility of Jupyter notebooks. Empirical Software Engineering 26, 4 (July 2021), 65. Google ScholarDigital Library
- Fernando Pérez and Brian E. Granger. 2015. Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science. Technical Report. UC Berkeley and Cal Poly. 24 pages. http://archive.ipython.org/JupyterGrantNarrative2015.pdfGoogle Scholar
- Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2021. KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, Madrid, Spain, 550--554. Google ScholarCross Ref
- Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2021. A Taxonomy of Tools for Reproducible Machine Learning Experiments. AIxIA 2021.Google Scholar
- Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2022. <Title omitted for double blind review>. Under minor revision at CSCW 2022.Google Scholar
- Adam Rule, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, Mai H. Nguyen, Sara Brin Rosenthal, Fernando Pérez, and Peter W. Rose. 2018. Ten Simple Rules for Reproducible Research in Jupyter Notebooks. arXiv:1810.08055 [cs] (Oct. 2018). http://arxiv.org/abs/1810.08055 arXiv: 1810.08055.Google Scholar
- Adam Rule, Ian Drosos, Aurélien Tabard, and James D. Hollan. 2018. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (Nov. 2018), 1--12. Google ScholarDigital Library
- Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proc. of the 2018 CHI Conference on Human Factors in Computing Systems. Google ScholarDigital Library
- Danilo Sato, Arif Wider, and Christoph Windheuser. 2019. Continuous Delivery for Machine Learning - Automating the end-to-end lifecycle of Machine Learning applications. https://martinfowler.com/articles/cd4ml.htmlGoogle Scholar
- David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In Advances in neural information processing systems. 2503--2511.Google Scholar
- Alex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. 2020. Adoption and Effects of Software Engineering Best Practices in Machine Learning. In Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (ESEM '20). Association for Computing Machinery, New York, NY, USA, 1--12. Google ScholarCross Ref
- Yiming Tang, Raffi Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, and Anita Raja. 2021. An empirical study of refactorings and technical debt in machine learning systems. In 2021 IEEE/ACM 43rd international conference on software engineering (ICSE). IEEE, Madrid, Spain, 238--250. tex.organization: IEEE. Google ScholarDigital Library
- Bart van Oort, Luís Cruz, Maurício Aniche, and Arie van Deursen. 2021. The Prevalence of Code Smells in Machine Learning projects. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain. arXiv: 2103.04146. Google ScholarCross Ref
- Zhiyuan Wan, Xin Xia, David Lo, and Gail C. Murphy. 2019. How does Machine Learning Change Software Development Practices? IEEE Transactions on Software Engineering (2019), 1--1. Google ScholarCross Ref
- April Yi Wang, Zihan Wu, Christopher Brooks, and Steve Oney. 2020. Callisto: Capturing the "Why" by Connecting Conversations with Computational Narratives. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--13. Google ScholarDigital Library
- Jiawei Wang, Li Li, and Andreas Zeller. 2020. Better code, better sharing: On the need of analyzing jupyter notebooks. In Proc. of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. ACM, 53--56. Google ScholarDigital Library
- Hironori Washizaki, Hiromu Uchida, Foutse Khomh, and Yann-Gael Gueheneuc. 2019. Studying Software Engineering Patterns for Designing Machine Learning Systems. 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP) (2019), 49--495. arXiv: 1910.04736. Google ScholarCross Ref
- Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW (May 2020), 022:1--022:23. Google ScholarDigital Library
Index Terms
- Assessing the quality of computational notebooks for a frictionless transition from exploration to production
Recommendations
What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing SystemsComputational notebooks - such as Azure, Databricks, and Jupyter - are a popular, interactive paradigm for data scientists to author code, analyze data, and interleave visualizations, all within a single document. Nevertheless, as data scientists ...
Modern software cybernetics
Classify software cybernetics as Software Cybernetics I and II.Identify the transition from Software Cybernetics I to Software Cybernetics II.Indicate that some new research areas are related to Software Cybernetics II.Highlight new research trends of ...
Exploring and Evaluating the Potential of 2D Computational Notebooks
ISS Companion '23: Companion Proceedings of the 2023 Conference on Interactive Surfaces and SpacesComputational notebooks are popular tools for data science and presentation of computational narratives. However, their 1D structure introduces and exacerbates user issues, such as messiness, tedious navigation, inefficient use of large screen space, ...
Comments