research-article

Practical Accuracy Evaluation for Deep Learning Systems via Latent Representation Discrepancy

Authors:

Yining Yin,

Yang Feng,

Zixi Liu,

Zhihong ZhaoAuthors Info & Claims

Internetware '23: Proceedings of the 14th Asia-Pacific Symposium on Internetware

Pages 205 - 215

https://doi.org/10.1145/3609437.3609457

Published: 05 October 2023 Publication History

Get Access

Abstract

As deep learning systems have been widely deployed in many safety-critical scenarios, their quality and reliability have raised growing concerns. Assuring the quality and evaluating the accuracy of deep learning models could be challenging because, unlike traditional software, DL systems rely on large amounts of labeled data for training and evaluation. The DL models have variability in their behavioral features on datasets with different distributions. In practical application, the potential distribution shift between training and usage scenarios may have an impact on the performance of the model and bring extra vulnerability to DL systems. Although some neuron coverage testing criteria have been proposed to assist in testing the DL systems, they are still limited by the amount of labeled data. Meanwhile, manual labeling test data collected from real-world application scenarios is very time-consuming and costly.

In this paper, we propose a novel testing metric, namely LRD, to evaluate the practical accuracy of deep learning systems without requiring the ground truth of test data. The metric uses optimal transport theory to compare model behavior on real-world test data to that on training and out-of-distribution (OOD) sets, by extracting latent representations from the model during input data processing and constructing representation patterns based on the training dataset. The paper further introduces two algorithms powered by the latent representation for out-of-distribution (OOD) data detection and LRD-guided test selection for model retraining. The experimental results show that the evaluation results of LRD have a significant positive correlation with the actual accuracy of the model, and the proposed algorithms are more effective than related OOD detection and test prioritization techniques.

References

[1]

[n. d.]. scipy.stats.wasserstein_distance — SciPy v1.11.1 Manual. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wasserstein_distance.html. (Accessed on 07/16/2023).

Abstract

References

Index Terms

Recommendations

Practical Accuracy Estimation for Efficient Deep Neural Network Testing

On the cost-effectiveness of composite metamorphic relations for testing deep learning systems

Evaluating Surprise Adequacy for Deep Learning System Testing

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations