skip to main content
10.1145/3611643.3616252acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

DeepDebugger: An Interactive Time-Travelling Debugging Approach for Deep Classifiers

Published: 30 November 2023 Publication History

Abstract

A deep classifier is usually trained to (i) learn the numeric representation vector of samples and (ii) classify sample representations with learned classification boundaries. Time-travelling visualization, as an explainable AI technique, is designed to transform the model training dynamics into an animation of canvas with colorful dots and territories. Despite that the training dynamics of the high-level concepts such as sample representations and classification boundaries are now observable, the model developers can still be overwhelmed by tens of thousands of moving dots across hundreds of training epochs (i.e., frames in the animation), which makes them miss important training events.
In this work, we make the first attempt to develop the model time-travelling visualizers to the model time-travelling debuggers, for its practical use in model debugging tasks. Specifically, given an animation of model training dynamics of sample representation and classification landscape, we propose DeepDebugger solution to recommend the samples of user interest in a human-in-the-loop manner. On one hand, DeepDebugger monitors the training dynamics of samples and recommends suspicious samples based on their abnormality. On the other hand, our recommendation is interactive and fault-resilient for the model developers to explore the training process. By learning users’ feedback, DeepDebugger refines its recommendation to fit their intention. Our extensive experiments on applying DeepDebugger on the known time-travelling visualizers show that DeepDebugger can (1) detect the majority of the abnormal movement of the training samples on canvas; (2) significantly boost the recommendation performance of samples of interest (5-10X more accurate than the baselines) with the runtime overhead of 0.015s per feedback; (3) be resilient under the 3%, 5%, 10% mistaken user feedback. Our user study of the tool shows that the interactive recommendation of DeepDebugger can help the participants accomplish the debugging tasks by saving 18.1% completion time and boosting the performance by 20.3%.

Supplementary Material

Video (fse23main-p111-p-video.mp4)
"A deep classifier is usually trained to (i) learn the numeric representation vector of samples and (ii) classify sample representations with learned classification boundaries. Time-travelling visualization, as an explainable AI technique, is designed to transform the model training dynamics into an animation of canvas with colorful dots and territories. Despite that the training dynamics of the high-level concepts such as sample representations and classification boundaries are now observable, the model developers can still be overwhelmed by tens of thousands of moving dots in hundreds of training epochs (i.e., frames in the animation), which makes them miss important training events such as abnormal movement dynamics (i.e., learning behavior) of certain samples. In this work, we make the first attempt to develop the model time-travelling visualizers to the model time-travelling debuggers, for its practical use in model debugging tasks. Specifically, given an animation of model training dynamics of sample representation and classification landscape, we propose DeepDebugger solution to recommend the samples of user interest in a human-in-the-loop manner. On one hand, DeepDebugger monitors the training dynamics of samples and recommends suspicious samples based on the abnormality of their training dynamics and model prediction. On the other hand, our recommendation is interactive and fault-resilient for the model developers to explore the training process. By learning users’ feedback, DeepDebugger refines its recommendation to fit their intention. Our extensive experiments on applying DeepDebugger on the known time-travelling visualizers show that DeepDebugger can (1) detect the majority of the abnormal movement of the training samples on canvas; (2) significantly boost the recommendation performance of samples of interest (5-10X more accurate than the baselines) with the runtime overhead of 0.015s per feedback; (3) be resilient under the 3%, 5%, 10% mistaken user feedback. Our user study, consisting of 16 participants on two model debugging tasks, shows that the interactive recommendation of DeepDebugger can help the participants accomplish the debugging tasks by saving 18.1% completion time or boosting the performance by 20.3%."

References

[1]
[n. d.]. Github Repository of DeepDebugger. https://github.com/code-philia/deep-debugger Accessed: 2023-09-01
[2]
[n. d.]. Website for DeepDebugger. https://sites.google.com/view/deep-debugger/home Accessed: 2023-09-01
[3]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org
[4]
Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2007. On the accuracy of spectrum-based fault localization. In Testing: Academic and industrial conference practice and research techniques-MUTATION (TAICPART-MUTATION 2007). 89–98.
[5]
Julius Adebayo, Michael Muelly, Ilaria Liccardi, and Been Kim. 2020. Debugging tests for model explanations. arXiv preprint arXiv:2011.05429.
[6]
Muhammad Adnan, Duaa H AlSaeed, Heyam H Al-Baity, and Abdur Rehman. 2021. Leveraging the Power of Deep Learning Technique for Creating an Intelligent, Context-Aware, and Adaptive M-Learning Model. Complexity, 2021 (2021).
[7]
Cyrille Artho. 2011. Iterative delta debugging. International Journal on Software Tools for Technology Transfer, 13, 3 (2011), 223–246.
[8]
Guillaume Bouchard. 2007. Efficient bounds for the softmax function, applications to inference in hybrid models.
[9]
Aditya Chattopadhay, Anirban Sarkar, Prantik Howlader, and Vineeth N Balasubramanian. 2018. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV). 839–847.
[10]
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29, 6 (2012), 141–142.
[11]
Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yulan Guo, Yinghui Gao, and Biao Li. 2020. Axiom-based grad-cam: Towards accurate visualization and explanation of cnns. arXiv preprint arXiv:2008.02312.
[12]
Liang Gong, David Lo, Lingxiao Jiang, and Hongyu Zhang. 2012. Interactive fault localization leveraging simple user feedback. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). 67–76.
[13]
Jun Han and Claudio Moraga. 1995. The influence of the sigmoid function parameters on the speed of backpropagation learning. In International workshop on artificial neural networks. 195–201.
[14]
Dan Hao, Lingming Zhang, Lu Zhang, Jiasu Sun, and Hong Mei. 2009. VIDA: Visual interactive debugging. In 2009 IEEE 31st International Conference on Software Engineering. 583–586.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[16]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8 (1997), 1735–1780.
[17]
Renáta Hodován and Ákos Kiss. 2016. Modernizing hierarchical delta debugging. In Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation. 31–37.
[18]
Ferenc Horváth, Árpád Beszédes, Béla Vancsics, Gergő Balogh, László Vidács, and Tibor Gyimóthy. 2022. Using contextual knowledge in interactive fault localization. Empirical Software Engineering, 27, 6 (2022), 1–69.
[19]
Andrew Ko and Brad Myers. 2008. Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 301–310.
[20]
Amy J Ko and Brad A Myers. 2008. Source-level debugging with the whyline. In Proceedings of the 2008 international workshop on Cooperative and human aspects of software engineering. 69–72.
[21]
Amy J Ko and Brad A Myers. 2009. Finding causes of program output with the Java Whyline. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1569–1578.
[22]
Amy J Ko and Brad A Myers. 2010. Extracting and answering why and why not questions about Java program output. ACM Transactions on Software Engineering and Methodology (TOSEM), 20, 2 (2010), 1–36.
[23]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images.
[24]
Henry David Kurzhals. 2022. Challenges and approaches related to AI-driven grading of open exam questions in higher education: human in the loop.
[25]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature, 521, 7553 (2015), 436–444.
[26]
Xiangyu Li, Shaowei Zhu, Marcelo d’Amorim, and Alessandro Orso. 2018. Enlightened debugging. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 82–92.
[27]
Jingjing Liang, Ruyi Ji, Jiajun Jiang, Shurui Zhou, Yiling Lou, Yingfei Xiong, and Gang Huang. 2021. Interactive Patch Filtering as Debugging Aid. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). 239–250.
[28]
Yun Lin, Jun Sun, Lyly Tran, Guangdong Bai, Haijun Wang, and Jinsong Dong. 2018. Break the dead end of dynamic slicing: Localizing data and control omission bug. In Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. 509–519.
[29]
Yun Lin, Jun Sun, Yinxing Xue, Yang Liu, and Jinsong Dong. 2017. Feedback-based debugging. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 393–403.
[30]
Cheng-Yuan Liou, Wei-Chen Cheng, Jiun-Wei Liou, and Daw-Ran Liou. 2014. Autoencoder for words. Neurocomputing, 139 (2014), 84–96.
[31]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[32]
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory, 28, 2 (1982), 129–137.
[33]
Aji Mubarek Mubalaike and Esref Adali. 2018. Deep Learning Approach for Intelligent Financial Fraud Detection System. In 2018 3rd International Conference on Computer Science and Engineering (UBMK). 598–603. https://doi.org/10.1109/UBMK.2018.8566574
[34]
Daniel Neimark, Omri Bar, Maya Zohar, and Dotan Asselmann. 2021. Video transformer network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3163–3172.
[35]
Fabio Petrillo, Yann-Gaël Guéhéneuc, Marcelo Pimenta, Carla Dal Sasso Freitas, and Foutse Khomh. 2019. Swarm debugging: The collective intelligence on interactive debugging. Journal of Systems and Software, 153 (2019), 152–174.
[36]
Selim Reza, Marta Campos Ferreira, JJM Machado, and João Manuel RS Tavares. 2022. A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Systems with Applications, 202 (2022), 117275.
[37]
Frank Schneider, Felix Dangel, and Philipp Hennig. 2021. Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks. Advances in Neural Information Processing Systems, 34 (2021), 20825–20837.
[38]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.
[39]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[40]
Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, and Alexander M Rush. 2018. Seq2seq-vis: A visual debugging tool for sequence-to-sequence models. IEEE transactions on visualization and computer graphics, 25, 1 (2018), 353–363.
[41]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[42]
Vladimir Vovk. 2013. Kernel ridge regression. In Empirical inference. Springer, 105–116.
[43]
Juan José Vázquez, Jamie Arjona, MªPaz Linares, and Josep Casanovas-Garcia. 2020. A Comparison of Deep Learning Methods for Urban Traffic Forecasting using Floating Car Data. Transportation Research Procedia, 47 (2020), 195–202. issn:2352-1465 https://doi.org/10.1016/j.trpro.2020.03.079 22nd EURO Working Group on Transportation Meeting, EWGT 2019, 18th – 20th September 2019, Barcelona, Spain
[44]
Jia Wang, Tong Sun, Benyuan Liu, Yu Cao, and Hongwei Zhu. 2021. CLVSA: a convolutional LSTM based variational sequence-to-sequence model with attention for predicting trends of financial markets. arXiv preprint arXiv:2104.04041.
[45]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/1708.07747.
[46]
Zhaogui Xu, Shiqing Ma, Xiangyu Zhang, Shuofei Zhu, and Baowen Xu. 2018. Debugging with intelligence via probabilistic inference. In Proceedings of the 40th International Conference on Software Engineering. 1171–1181.
[47]
Haoyang Yan, Xiaolei Ma, and Ziyuan Pu. 2021. Learning dynamic and hierarchical traffic spatiotemporal features with transformer. IEEE Transactions on Intelligent Transportation Systems.
[48]
Xianglin Yang, Yun Lin, Ruofan Liu, and Jin Song Dong. 2022. Temporality Spatialization: A Scalable and Faithful Time-Travelling Visualization for Deep Classifier Training. In The 31st International Joint Conference on Artificial Intelligence (IJCAI).
[49]
Xianglin Yang, Yun Lin, Ruofan Liu, Zhenfeng He, Chao Wang, Jin Song Dong, and Hong Mei. 2022. DeepVisualInsight: Time-Travelling Visualization for Spatio-Temporal Causality of Deep Classification Training. In The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI).
[50]
Andreas Zeller. 1999. Yesterday, my program worked. Today, it does not. Why? ACM SIGSOFT Software engineering notes, 24, 6 (1999), 253–267.
[51]
Mengshi Zhang, Yaoxian Li, Xia Li, Lingchao Chen, Yuqun Zhang, Lingming Zhang, and Sarfraz Khurshid. 2019. An empirical study of boosting spectrum-based fault localization via pagerank. IEEE Transactions on Software Engineering, 47, 6 (2019), 1089–1113.
[52]
Qiuyue Zhang, Chao Qin, Yunfeng Zhang, Fangxun Bao, Caiming Zhang, and Peide Liu. 2022. Transformer-based attention network for stock movement prediction. Expert Systems with Applications, 202 (2022), 117239.
[53]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 25, 2 (1996), 103–114.
[54]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Rec., 25, 2 (1996), jun, 103–114. issn:0163-5808 https://doi.org/10.1145/235968.233324

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2023
2215 pages
ISBN:9798400703270
DOI:10.1145/3611643
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. debugging
  2. deep classifier
  3. user study
  4. visualization

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • the Minister of Education, Singapore
  • NUS-NCS Joint Laboratory for Cyber Security, Singapore, the National Research Foundation, Singapore, and Cyber Security Agency of Singapore under its National Cybersecurity Research and Development Programme
  • A*STAR, CISCO Systems (USA) Pte. Ltd and National University of Singapore under its Cisco-NUS Accelerated Digital Economy Corporate Laboratory
  • National Research Foundation, Singapore, and the Cyber Security Agency under its National Cybersecurity R&D Programme

Conference

ESEC/FSE '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 158
    Total Downloads
  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)7
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media