skip to main content
10.1145/3510454.3517055acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Assessing the quality of computational notebooks for a frictionless transition from exploration to production

Published:19 October 2022Publication History

ABSTRACT

The massive trend of integrating data-driven AI capabilities into traditional software systems is rising new intriguing challenges. One of such challenges is achieving a smooth transition from the explorative phase of Machine Learning projects - in which data scientists build prototypical models in the lab - to their production phase - in which software engineers translate prototypes into production-ready AI components. To narrow down the gap between these two phases, tools and practices adopted by data scientists might be improved by incorporating consolidated software engineering solutions. In particular, computational notebooks have a prominent role in determining the quality of data science prototypes. In my research project, I address this challenge by studying the best practices for collaboration with computational notebooks and proposing proof-of-concept tools to foster guidelines compliance.

References

  1. Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: a Case Study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice. IEEE Press, 291--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anders Arpteg, Björn Brinne, Luka Crnkovic-Friis, and Jan Bosch. 2018. Software engineering challenges of deep learning. In 2018 44th euromicro conference on software engineering and advanced applications (SEAA). 50--59. tex.organization: IEEE.Google ScholarGoogle Scholar
  3. Justus Bogner, Roberto Verdecchia, and Ilias Gerostathopoulos. 2021. Characterizing Technical Debt and Antipatterns in AI-Based Systems: A Systematic Mapping Study. In 2021 IEEE/ACM International Conference on Technical Debt (TechDebt). IEEE, Madrid, Spain. arXiv:2103.09783. Google ScholarGoogle ScholarCross RefCross Ref
  4. Vahid Garousi, Michael Felderer, and Mika V. Mäntylä. 2019. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Information and Software Technology 106 (2019), 101 -- 121. Google ScholarGoogle ScholarCross RefCross Ref
  5. Joel Grus. 2018. I don't like notebooks. https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/68282.htmlGoogle ScholarGoogle Scholar
  6. Andrew Head, Fred Hohman, Titus Barik, Steven M. Drucker, and Robert DeLine. 2019. Managing Messes in Computational Notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI '19. ACM Press, Glasgow, Scotland Uk, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mary Beth Kery and Brad A Myers. 2018. Interactions for untangling messy history in a computational notebook. In 2018 IEEE symposium on visual languages and human-centric computing (VL/HCC). 147--155. tex.organization: IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  8. Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2018. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 44, 11 (2018), 1024--1038. Publisher: IEEE. Google ScholarGoogle ScholarCross RefCross Ref
  9. Andreas Koenzen, Neil Ernst, and Margaret-Anne Storey. 2020. Code Duplication and Reuse in Jupyter Notebooks. In Proc. of the 2020 Symposium on Visual Languages and Human-Centric Computing. Google ScholarGoogle ScholarCross RefCross Ref
  10. Filippo Lanubile, Fabio Calefato, Luigi Quaranta, Maddalena Amoruso, Fabio Fumarola, and Michele Filannino. 2021. Towards Productizing AI/ML Models: An Industry Perspective from Data Scientists. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain, 129--132. Google ScholarGoogle ScholarCross RefCross Ref
  11. Grace A. Lewis, Stephany Bellomo, and April Galyardt. 2019. Component Mismatches Are a Critical Bottleneck to Fielding AI-Enabled Systems in the Public Sector. In arXiv:1910.06136 [cs]. Arlington, Virginia, USA. http://arxiv.org/abs/1910.06136 arXiv: 1910.06136.Google ScholarGoogle Scholar
  12. Grace A. Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain. Google ScholarGoogle ScholarCross RefCross Ref
  13. Lucy Ellen Lwakatare, Aiswarya Raj, Jan Bosch, Helena Holmström Olsson, and Ivica Crnkovic. 2019. A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation. In Agile Processes in Software Engineering and Extreme Programming, Philippe Kruchten, Steven Fraser, and François Coallier (Eds.). Springer International Publishing, 227--243.Google ScholarGoogle Scholar
  14. Lucy Ellen Lwakatare, Aiswarya Raj, Ivica Crnkovic, Jan Bosch, and Helena Holmström Olsson. 2020. Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Information and Software Technology 127 (2020), 106368. Google ScholarGoogle ScholarCross RefCross Ref
  15. E Nascimento, I Ahmed, E Oliveira, MP Palheta, I Steinmacher, and T Conte. 2019. Understanding Development Process of Machine Learning Systems: Challenges and Solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  16. Jeffrey M Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature 563, 7732 (2018), 145--147.Google ScholarGoogle Scholar
  17. Joao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In Proc. of the 16th International Conference on Mining Software Repositories. 507--517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2021. Understanding and improving the quality and reproducibility of Jupyter notebooks. Empirical Software Engineering 26, 4 (July 2021), 65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fernando Pérez and Brian E. Granger. 2015. Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science. Technical Report. UC Berkeley and Cal Poly. 24 pages. http://archive.ipython.org/JupyterGrantNarrative2015.pdfGoogle ScholarGoogle Scholar
  20. Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2021. KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, Madrid, Spain, 550--554. Google ScholarGoogle ScholarCross RefCross Ref
  21. Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2021. A Taxonomy of Tools for Reproducible Machine Learning Experiments. AIxIA 2021.Google ScholarGoogle Scholar
  22. Luigi Quaranta, Fabio Calefato, and Filippo Lanubile. 2022. <Title omitted for double blind review>. Under minor revision at CSCW 2022.Google ScholarGoogle Scholar
  23. Adam Rule, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, Mai H. Nguyen, Sara Brin Rosenthal, Fernando Pérez, and Peter W. Rose. 2018. Ten Simple Rules for Reproducible Research in Jupyter Notebooks. arXiv:1810.08055 [cs] (Oct. 2018). http://arxiv.org/abs/1810.08055 arXiv: 1810.08055.Google ScholarGoogle Scholar
  24. Adam Rule, Ian Drosos, Aurélien Tabard, and James D. Hollan. 2018. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (Nov. 2018), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and Explanation in Computational Notebooks. In Proc. of the 2018 CHI Conference on Human Factors in Computing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Danilo Sato, Arif Wider, and Christoph Windheuser. 2019. Continuous Delivery for Machine Learning - Automating the end-to-end lifecycle of Machine Learning applications. https://martinfowler.com/articles/cd4ml.htmlGoogle ScholarGoogle Scholar
  27. David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In Advances in neural information processing systems. 2503--2511.Google ScholarGoogle Scholar
  28. Alex Serban, Koen van der Blom, Holger Hoos, and Joost Visser. 2020. Adoption and Effects of Software Engineering Best Practices in Machine Learning. In Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (ESEM '20). Association for Computing Machinery, New York, NY, USA, 1--12. Google ScholarGoogle ScholarCross RefCross Ref
  29. Yiming Tang, Raffi Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, and Anita Raja. 2021. An empirical study of refactorings and technical debt in machine learning systems. In 2021 IEEE/ACM 43rd international conference on software engineering (ICSE). IEEE, Madrid, Spain, 238--250. tex.organization: IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Bart van Oort, Luís Cruz, Maurício Aniche, and Arie van Deursen. 2021. The Prevalence of Code Smells in Machine Learning projects. In 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN). IEEE, Madrid, Spain. arXiv: 2103.04146. Google ScholarGoogle ScholarCross RefCross Ref
  31. Zhiyuan Wan, Xin Xia, David Lo, and Gail C. Murphy. 2019. How does Machine Learning Change Software Development Practices? IEEE Transactions on Software Engineering (2019), 1--1. Google ScholarGoogle ScholarCross RefCross Ref
  32. April Yi Wang, Zihan Wu, Christopher Brooks, and Steve Oney. 2020. Callisto: Capturing the "Why" by Connecting Conversations with Computational Narratives. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jiawei Wang, Li Li, and Andreas Zeller. 2020. Better code, better sharing: On the need of analyzing jupyter notebooks. In Proc. of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. ACM, 53--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hironori Washizaki, Hiromu Uchida, Foutse Khomh, and Yann-Gael Gueheneuc. 2019. Studying Software Engineering Patterns for Designing Machine Learning Systems. 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP) (2019), 49--495. arXiv: 1910.04736. Google ScholarGoogle ScholarCross RefCross Ref
  35. Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW (May 2020), 022:1--022:23. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Assessing the quality of computational notebooks for a frictionless transition from exploration to production

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings
        May 2022
        394 pages
        ISBN:9781450392235
        DOI:10.1145/3510454

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate276of1,856submissions,15%

        Upcoming Conference

        ICSE 2025

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader