skip to main content
10.1145/3411763.3451679acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
poster

A Closer Look at Machine Learning Code

Published: 08 May 2021 Publication History

Abstract

Software using Machine Learning algorithms is becoming ever more ubiquitous making it equally important to have good development processes and practices. Whether we can apply insights from software development research remains open though, since it is not yet clear, whether data-driven development has the same requirements as its traditional counterpart. We used eye tracking to investigate whether the code reading behaviour of developers differs between code that uses Machine Learning and code that does not. Our data shows that there are differences in what parts of the code people consider of interest and how they read it. This is a consequence of differences in both syntax and semantics of the code. This reading behaviour already shows that we cannot take existing solutions as universally applicable. In the future, methods that support Machine Learning must iterate on existing knowledge to meet the challenges of data-driven development.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
[2]
Amina Adadi and Mohammed Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
[3]
Matt Adereth. 2014. Silverman’s Mode Estimation Method Explained. http://adereth.github.io/blog/2014/10/12/silvermans-mode-detection-method-explained/.
[4]
Maike Ahrens, Kurt Schneider, and Melanie Busch. 2019. Attention in software maintenance: an eye tracking study. In Proceedings of the 6th International Workshop on Eye Movements in Programming, EMIP@ICSE 2019, Montreal, Quebec, Canada, May 27, 2019, Andrew Begeland Janet Siegmund (Eds.). IEEE / ACM, 2–9. https://doi.org/10.1109/EMIP.2019.00009
[5]
Russ B. Altman. 1999. AI in Medicine: The Spectrum of Challenges from Managed Care to Molecular Medicine. AI Magazine 20, 3 (Sep. 1999), 67. https://doi.org/10.1609/aimag.v20i3.1467
[6]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald C. Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: a case study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice, ICSE (SEIP) 2019, Montreal, QC, Canada, May 25-31, 2019, Helen Sharp and Mike Whalen (Eds.). IEEE / ACM, 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042
[7]
Abdul Ali Bangash, Hareem Sahar, Shaiful Alam Chowdhury, Alexander William Wong, Abram Hindle, and Karim Ali. 2019. What do developers know about machine learning: a study of ML discussions on StackOverflow. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada, Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc (Eds.). IEEE / ACM, 260–264. https://doi.org/10.1109/MSR.2019.00052
[8]
Carrie J. Cai and Philip J. Guo. 2019. Software Developers Learning Machine Learning: Motivations, Hurdles, and Desires. In 2019 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2019, Memphis, Tennessee, USA, October 14-18, 2019, Justin Smith, Christopher Bogart, Judith Good, and Scott D. Fleming (Eds.). IEEE Computer Society, 25–34. https://doi.org/10.1109/VLHCC.2019.8818751
[9]
Simone Di Cola, Cuong M. Tran, and Kung-Kiu Lau. 2015. A Graphical Tool for Model-Driven Development Using Components and Services. In 41st Euromicro Conference on Software Engineering and Advanced Applications, EUROMICRO-SEAA 2015, Madeira, Portugal, August 26-28, 2015. IEEE Computer Society, 181–182. https://doi.org/10.1109/SEAA.2015.13
[10]
Janez Demsar, Tomaz Curk, Ales Erjavec, Crtomir Gorup, Tomaz Hocevar, Mitar Milutinovic, Martin Mozina, Matija Polajnar, Marko Toplak, Anze Staric, Miha Stajdohar, Lan Umek, Lan Zagar, Jure Zbontar, Marinka Zitnik, and Blaz Zupan. 2013. Orange: data mining toolbox in python. J. Mach. Learn. Res. 14, 1 (2013), 2349–2353. http://dl.acm.org/citation.cfm?id=2567736
[11]
Mengnan Du, Ninghao Liu, and Xia Hu. 2019. Techniques for Interpretable Machine Learning. Commun. ACM 63, 1 (Dec. 2019), 68–77. https://doi.org/10.1145/3359786
[12]
Ronald A Fisher and Michael Marshall. 1936. Iris data set. RA Fisher, UC Irvine Machine Learning Repository 440 (1936), 87.
[13]
Randy Goebel, Ajay Chander, Katharina Holzinger, Freddy Lecue, Zeynep Akata, Simone Stumpf, Peter Kieseberg, and Andreas Holzinger. 2018. Explainable AI: The New 42?. In Machine Learning and Knowledge Extraction, Andreas Holzinger, Peter Kieseberg, A Min Tjoa, and Edgar Weippl (Eds.). Springer International Publishing, Cham, 295–303.
[14]
David Gunning, Mark Stefik, Jaesik Choi, Timothy Miller, Simone Stumpf, and Guang-Zhong Yang. 2019. XAI—Explainable artificial intelligence. Science Robotics 4, 37 (2019). https://doi.org/10.1126/scirobotics.aay7120
[15]
Oleg Yu. Gusikhin, Nestor Rychtyckyj, and Dimitar P. Filev. 2007. Intelligent systems in the automotive industry: applications and trends. Knowl. Inf. Syst. 12, 2 (2007), 147–168. https://doi.org/10.1007/s10115-006-0063-1
[16]
Pavel Hamet and Johanne Tremblay. 2017. Artificial intelligence in medicine. Metabolism 69(2017), S36 – S40. https://doi.org/10.1016/j.metabol.2017.01.011 Insights Into the Future of Medicine: Technologies, Concepts, and Integration.
[17]
Nutan Farah Haq, Abdur Rahman Onik, Md. Avishek Khan Hridoy, Musharrat Rafni, Faisal Muhammad Shah, and Dewan Md. Farid. 2015. Application of Machine Learning Approaches in Intrusion Detection System: A Survey. International Journal of Advanced Research in Artificial Intelligence 4, 3(2015). https://doi.org/10.14569/IJARAI.2015.040302
[18]
Robert R. Hoffman, Shane T. Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for Explainable AI: Challenges and Prospects. CoRR abs/1812.04608(2018). http://arxiv.org/abs/1812.04608
[19]
Martin Hofmann, Florian Neukart, and Thomas Bäck. 2017. Artificial Intelligence and Data Science in the Automotive Industry. CoRR abs/1709.01989(2017). http://arxiv.org/abs/1709.01989
[20]
A. Holzinger. 2018. From Machine Learning to Explainable AI. In 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA). 55–66. https://doi.org/10.1109/DISA.2018.8490530
[21]
Andreas Holzinger, Chris Biemann, Constantinos S. Pattichis, and Douglas B. Kell. 2017. What do we need to build explainable AI systems for the medical domain?CoRR abs/1712.09923(2017). http://arxiv.org/abs/1712.09923
[22]
Werner Horn. 2001. AI in medicine on its way from knowledge-intensive to data-intensive systems. Artificial Intelligence in Medicine 23, 1 (2001), 5 – 12. https://doi.org/10.1016/S0933-3657(01)00072-0
[23]
Toyomi Ishida and Hidetake Uwano. 2019. Synchronized Analysis of Eye Movement and EEG during Program Comprehension. In Proceedings of the 6th International Workshop on Eye Movements in Programming (Montreal, Quebec, Canada) (EMIP ’19). IEEE Press, 26–32. https://doi.org/10.1109/EMIP.2019.00012
[24]
Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. 1998. The MNIST Database of Handwritten Digits.
[25]
Jun Li, Hong Cheng, Hongliang Guo, and Shaobo Qiu. 2018. Survey on Artificial Intelligence for Vehicles. Automotive Innovation 1, 1 (01 Jan 2018), 2–14. https://doi.org/10.1007/s42154-018-0009-9
[26]
Jakob Nielsen. 2006. F-Shaped Pattern For Reading Web Content. https://www.nngroup.com/articles/f-shaped-pattern-reading-web-content-discovered/ Retrieved 2021/04/25 20:41:38.
[27]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[28]
Alun Preece, Dan Harborne, Dave Braines, Richard Tomsett, and Supriyo Chakraborty. 2018. Stakeholders in Explainable AI. arxiv:1810.00184 [cs.AI]
[29]
Danil V. Prokhorov (Ed.). 2008. Computational Intelligence in Automotive Applications. Studies in Computational Intelligence, Vol. 132. Springer.
[30]
Mireia Ribera and Àgata Lapedriza. 2019. Can we do better explanations? A proposal of user-centered explainable AI. In Joint Proceedings of the ACM IUI 2019 Workshops co-located with the 24th ACM Conference on Intelligent User Interfaces (ACM IUI 2019), Los Angeles, USA, March 20, 2019(CEUR Workshop Proceedings, Vol. 2327), Christoph Trattner, Denis Parra, and Nathalie Riche (Eds.). CEUR-WS.org. http://ceur-ws.org/Vol-2327/IUI19WS-ExSS2019-12.pdf
[31]
Oliver Ritthoff, Ralf Klinkenberg, Simon Fischer, Ingo Mierswa, and Sven Felske. 2001. Yale: Yet Another Learning Environment.
[32]
Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sidney D’Mello. 2014. Improving Automated Source Code Summarization via an Eye-Tracking Study of Programmers. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 390–401. https://doi.org/10.1145/2568225.2568247
[33]
G. Salton, A. Wong, and C. S. Yang. 1975. A Vector Space Model for Automatic Indexing. Commun. ACM 18, 11 (1975), 613–620. https://doi.org/10.1145/361219.361220
[34]
Edward H. Shortliffe. 1993. The adolescence of AI in Medicine: Will the field come of age in the ’90s?Artificial Intelligence in Medicine 5, 2 (1993), 93 – 106. https://doi.org/10.1016/0933-3657(93)90011-Q Artificial Intelligence in Medicine: State-of-the-Art and Future Prospects.
[35]
Geoffrey Sparks. 2012. Enterprise Architect User Guide.
[36]
Dave Steinberg, Frank Budinsky, Marcelo Paternostro, and Ed Merks. 2009. EMF: Eclipse Modeling Framework(2. ed.). Addison-Wesley, Boston, MA. http://proquestcombo.safaribooksonline.com/9780321331885
[37]
X. Sun, T. Zhou, G. Li, J. Hu, H. Yang, and B. Li. 2017. An Empirical Study on Real Bugs for Machine Learning Programs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC). 348–357. https://doi.org/10.1109/APSEC.2017.41
[38]
Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao Jiang. 2012. An Empirical Study of Bugs in Machine Learning Systems. In 23rd IEEE International Symposium on Software Reliability Engineering, ISSRE 2012, Dallas, TX, USA, November 27-30, 2012. IEEE Computer Society, 271–280. https://doi.org/10.1109/ISSRE.2012.22
[39]
E. Tjoa and C. Guan. 2020. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE Transactions on Neural Networks and Learning Systems (2020), 1–21. https://doi.org/10.1109/TNNLS.2020.3027314
[40]
Daniele Ucci, Leonardo Aniello, and Roberto Baldoni. 2019. Survey of machine learning techniques for malware analysis. Computers & Security 81(2019), 123 – 147. https://doi.org/10.1016/j.cose.2018.11.001
[41]
A. Von Mayrhauser and A. M. Vans. 1995. Program comprehension during software maintenance and evolution. Computer 28, 8 (1995), 44–55. https://doi.org/10.1109/2.402076
[42]
Michael Waskom. 2020. Visualizing the distribution of a dataset. https://seaborn.pydata.org/tutorial/distributions.html. Retrieved 2021/04/25 20:41:38.

Cited By

View all
  • (2024)An Investigation of How Software Developers Read Machine Learning CodeProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686678(165-176)Online publication date: 24-Oct-2024
  • (2024)On Eye Tracking in Software EngineeringSN Computer Science10.1007/s42979-024-03045-35:6Online publication date: 26-Jul-2024
  • (2023)Supporting Software Developers Through a Gaze-Based Adaptive IDEProceedings of Mensch und Computer 202310.1145/3603555.3603571(267-276)Online publication date: 3-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems
May 2021
2965 pages
ISBN:9781450380959
DOI:10.1145/3411763
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code reading
  2. eye tracking
  3. machine learning

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

CHI '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Investigation of How Software Developers Read Machine Learning CodeProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686678(165-176)Online publication date: 24-Oct-2024
  • (2024)On Eye Tracking in Software EngineeringSN Computer Science10.1007/s42979-024-03045-35:6Online publication date: 26-Jul-2024
  • (2023)Supporting Software Developers Through a Gaze-Based Adaptive IDEProceedings of Mensch und Computer 202310.1145/3603555.3603571(267-276)Online publication date: 3-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media