research-article

DeepDebugger: An Interactive Time-Travelling Debugging Approach for Deep Classifiers

Authors:

Yun Lin,

Hong MeiAuthors Info & Claims

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 973 - 985

https://doi.org/10.1145/3611643.3616252

Published: 30 November 2023 Publication History

Get Access

Abstract

A deep classifier is usually trained to (i) learn the numeric representation vector of samples and (ii) classify sample representations with learned classification boundaries. Time-travelling visualization, as an explainable AI technique, is designed to transform the model training dynamics into an animation of canvas with colorful dots and territories. Despite that the training dynamics of the high-level concepts such as sample representations and classification boundaries are now observable, the model developers can still be overwhelmed by tens of thousands of moving dots across hundreds of training epochs (i.e., frames in the animation), which makes them miss important training events.

In this work, we make the first attempt to develop the model time-travelling visualizers to the model time-travelling debuggers, for its practical use in model debugging tasks. Specifically, given an animation of model training dynamics of sample representation and classification landscape, we propose DeepDebugger solution to recommend the samples of user interest in a human-in-the-loop manner. On one hand, DeepDebugger monitors the training dynamics of samples and recommends suspicious samples based on their abnormality. On the other hand, our recommendation is interactive and fault-resilient for the model developers to explore the training process. By learning users’ feedback, DeepDebugger refines its recommendation to fit their intention. Our extensive experiments on applying DeepDebugger on the known time-travelling visualizers show that DeepDebugger can (1) detect the majority of the abnormal movement of the training samples on canvas; (2) significantly boost the recommendation performance of samples of interest (5-10X more accurate than the baselines) with the runtime overhead of 0.015s per feedback; (3) be resilient under the 3%, 5%, 10% mistaken user feedback. Our user study of the tool shows that the interactive recommendation of DeepDebugger can help the participants accomplish the debugging tasks by saving 18.1% completion time and boosting the performance by 20.3%.

Supplementary Material

Video (fse23main-p111-p-video.mp4)

"A deep classifier is usually trained to (i) learn the numeric representation vector of samples and (ii) classify sample representations with learned classification boundaries. Time-travelling visualization, as an explainable AI technique, is designed to transform the model training dynamics into an animation of canvas with colorful dots and territories. Despite that the training dynamics of the high-level concepts such as sample representations and classification boundaries are now observable, the model developers can still be overwhelmed by tens of thousands of moving dots in hundreds of training epochs (i.e., frames in the animation), which makes them miss important training events such as abnormal movement dynamics (i.e., learning behavior) of certain samples. In this work, we make the first attempt to develop the model time-travelling visualizers to the model time-travelling debuggers, for its practical use in model debugging tasks. Specifically, given an animation of model training dynamics of sample representation and classification landscape, we propose DeepDebugger solution to recommend the samples of user interest in a human-in-the-loop manner. On one hand, DeepDebugger monitors the training dynamics of samples and recommends suspicious samples based on the abnormality of their training dynamics and model prediction. On the other hand, our recommendation is interactive and fault-resilient for the model developers to explore the training process. By learning users’ feedback, DeepDebugger refines its recommendation to fit their intention. Our extensive experiments on applying DeepDebugger on the known time-travelling visualizers show that DeepDebugger can (1) detect the majority of the abnormal movement of the training samples on canvas; (2) significantly boost the recommendation performance of samples of interest (5-10X more accurate than the baselines) with the runtime overhead of 0.015s per feedback; (3) be resilient under the 3%, 5%, 10% mistaken user feedback. Our user study, consisting of 16 participants on two model debugging tasks, shows that the interactive recommendation of DeepDebugger can help the participants accomplish the debugging tasks by saving 18.1% completion time or boosting the performance by 20.3%."

Download
65.97 MB

References

[1]

[n. d.]. Github Repository of DeepDebugger. https://github.com/code-philia/deep-debugger Accessed: 2023-09-01

Abstract

Supplementary Material

References

Index Terms

Recommendations

Source-level debugging with the whyline

Debugging of Dependability Models Using Interactive Visualization of Counterexamples

A system for real-time interactive analysis of deep learning training

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations