Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments
Introduction
Deep neural networks (DNNs) have recently achieved outstanding predictive performance and have become an indispensable tool in a wide range of pattern recognition applications (e.g., image classification, object detection, video understanding, document analysis, etc.). Though showing impressively high predictive accuracy, DNNs are often perceived as black-box models with deep, computationally expensive layers and have been recently found vulnerable to spoofing with well-designed input samples in many safety-critical applications. Extensive clues have been found in several sensitive or real-time pattern recognition applications, including medical diagnosis, face recognition, and self-driving cars. In these scenarios, the cost of a simple prediction error could be significantly high, and thus the reliance on the trained model and its capability to deliver both efficient and robust data processing must be guaranteed. Therefore, understanding the behaviors of DNNs, gaining insights into their working mechanisms, and further generating explainable deep learning models have become essential and fundamental problems.
Though DNNs have shown remarkable progress in a broad spectrum of applications, it is, unfortunately, unclear what information must be present in the input data and how it must be used in deep learning models to guarantee a fast, safe and stable prediction. Recently there has been an explosion of interest along related research directions, such as (a) interpreting the learned representations and generated decisions or quantifying the reliability of the decisions, (b) analyzing the computational bottleneck in network architectures for efficient learning, and (c) regularizing the network structure or designing specific training schemes for stable and robust prediction. These encouraging signs of progress bring profound implications on the research into the topic of Explainable Deep Learning for Efficient and Robust Pattern Recognition. This essential and open research topic brings new challenges and opportunities to the pattern recognition community. Tremendous efforts are required for these applications to uncover the fundamental mechanisms from several different points of view, including information theory, machine learning, computer vision, information security, etc. Moreover, it also potentially benefits a variety of closely related areas in pattern recognition areas, and opens up the possibility of practical safety-critical or low-cost applications. This special issue aims to provide a forum for researchers and practitioners in the broad deep learning and pattern recognition community to present their novel and original research for explainable deep learning.
We have received 58 high-quality submissions and accepted 30 papers after a rigorous review process. The accepted papers cover some key tasks in pattern recognition, such as image classification, recognition, clustering, semantic segmentation, object detection, zero-shot learning, and domain adaptation, which all involve explainable deep learning for efficient and robust pattern recognition. These papers can be roughly divided into three major categories: (1) quantifying or visualizing the interpretability of deep neural networks for explainable deep learning, (2) structure optimization of neural networks in pattern recognition applications for efficient deep learning, and (3) adversarial attacks and defending critical applications in pattern recognition and stability improvements of neural network optimization for robust deep learning.
In this guest editorial, we briefly review representative works and recent advances of explainable deep learning for efficient and robust pattern recognition in a structured framework, followed by introducing the accepted papers for this special issue. They are grouped into three categories, i.e., explainable, efficient and robust deep learning. The overall structure of this editorial is shown in Fig. 1. We hope that future developments of explainable, efficient and robust deep learning in pattern recognition applications could be promoted by the survey of recent developments and the contributed papers in this special issue.
Section snippets
Explainable deep learning methods
There are plenty of approaches that aim at providing explanations to the working mechanism of deep neural networks. A majority of explanation methods focus on attributing the prediction of DNN to its input features [1]. Such attribution-based methods cover most visualization methods in computer vision, which give explanation directly in the domain of input images by localizing regions that contribute most to the decision. Besides, many other non-attribution-based methods provide explanations in
Efficient deep learning via model compression and acceleration
Deep neural networks have achieved the highest precision in various tasks at the expense of many parameters, requiring significant computational resources and training time. Thus there is a huge demand for model compression and acceleration techniques before deploying to resource-constrained devices and real-time applications. In recent years, a growing number of methods have been presented for compressing and accelerating the network while making the slightest compromise with the model
Robustness and stability in deep learning
Robustness reveals the model’s ability to provide reliable decisions under data noise. In recent years, several aspects of robustness related to deep learning have been studied. The hottest topic among them is adversarial robustness, as it closely correlates to the safety issue in the application. Stability is another critical issue of deep neural networks that determines whether a network converges successfully. Several techniques can help deal with training stability, such as normalization
Conclusion
This guest editorial provides a comprehensive review of the representative works and recent developments in explainable deep learning for efficient and robust pattern recognition and introduces the thirty accepted papers for this special issue. The accepted papers are of high quality, providing the most recent advances in improving the interpretability of deep learning methods, designing compact and efficient network architectures in specific pattern recognition problems, designing novel
Acknowledgments
This work was supported by the National Natural Science Foundation of China project no. 61772057, Beijing Natural Science Foundation (4202039), the support funding Jiangxi Research Institute of Beihang University.
Xiao Bai received the B.Eng. degree in computer science from Beihang University, Beijing, China, in 2001, and the Ph.D. degree in computer science from the University of York, York, U.K., in 2006. He was a Research Officer (Fellow and Scientist) with the Computer Science Department, University of Bath, Bath, U.K., until 2008. He is currently a Full Professor with the School of Computer Science and Engineering, Beihang University. He has authored or coauthored more than 100 papers in journals
References (133)
- et al.
Explaining the semantics capturing capability of scene graph generation models
Pattern Recognit.
(2021) - et al.
Explainable skin lesion diagnosis using taxonomies
Pattern Recognit.
(2021) - et al.
Graph-based neural networks for explainable image privacy inference
Pattern Recognit.
(2020) - et al.
Towards interpretable and robust hand detection via pixel-wise prediction
Pattern Recognit.
(2020) - et al.
Robust one-stage object detection with location-aware classifiers
Pattern Recognit.
(2020) - et al.
Deep multi-task learning with relational attention for business success prediction
Pattern Recognit.
(2021) - et al.
End-to-end video text detection with online tracking
Pattern Recognit.
(2021) - et al.
Learning eeg topographical representation for classification via convolutional neural network
Pattern Recognit.
(2020) - et al.
Self-attention driven adversarial similarity learning network
Pattern Recognit.
(2020) - et al.
Deep transductive network for generalized zero shot learning
Pattern Recognit.
(2020)
Exploring uncertainty in pseudo-label guided unsupervised domain adaptation
Pattern Recognit.
Probabilistic framework for solving visual dialog
Pattern Recognit.
Deep features for person re-identification on metric learning
Pattern Recognit.
Towards non-IID image classification: a dataset and baselines
Pattern Recognit.
Goal driven network pruning for object recognition
Pattern Recognit.
Deep quantization generative networks
Pattern Recognit.
Binary neural networks: a survey
Pattern Recognit.
Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis
Pattern Recognit.
Efficient semantic segmentation with pyramidal fusion
Pattern Recognit.
Gated CNN: integrating multi-scale feature layers for object detection
Pattern Recognit.
Heterogenous output regression network for direct face alignment
Pattern Recognit.
Learning residual refinement network with semantic context representation for real-time saliency object detection
Pattern Recognit.
Low: training deep neural networks by learning optimal sample weights
Pattern Recognit.
Axiomatic attribution for deep networks
International Conference on Machine Learning (ICML), PMLR
Learning deep features for discriminative localization
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Deep inside convolutional networks: visualising image classification models and saliency maps
International Conference on Learning Representations (ICLR Workshop Track)
Striving for simplicity: the all convolutional net
International Conference on Learning Representations (ICLR Workshop Track)
Grad-CAM: visual explanations from deep networks via gradient-based localization
Int. J. Comput. Vis.
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation
PLoS One
Learning important features through propagating activation differences
International Conference on Machine Learning (ICML), PMLR
Towards better understanding of gradient-based attribution methods for deep neural networks
International Conference on Learning Representations (ICLR)
Interpretable explanations of black boxes by meaningful perturbation
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Interpretable and fine-grained visual explanations for convolutional neural networks
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Rise: randomized input sampling for explanation of black-box models
British Machine Vision Conference (BMVC)
Ablation-CAM: visual explanations for deep convolutional network via gradient-free localization
Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV)
Score-CAM: score-weighted visual explanations for convolutional neural networks
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
A unified approach to interpreting model predictions
Advances in Neural Information Processing Systems (NeurIPS)
“Why should I trust you?” Explaining the predictions of any classifier
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV)
International Conference on Machine Learning (ICML), PMLR
Examples are not enough, learn to criticize! criticism for interpretability
Advances in Neural Information Processing Systems (NeurIPS)
Counterfactual explanations without opening the black box: automated decisions and the GDPR
Harv. J. Law Tech.
What uncertainties do we need in Bayesian deep learning for computer vision?
Advances in Neural Information Processing Systems (NeurIPS)
Dropout as a Bayesian approximation: representing model uncertainty in deep learning
International Conference on Machine Learning (ICML), PMLR
Bounding box regression with uncertainty for accurate object detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Gaussian YOLOv3: an accurate and fast object detector using localization uncertainty for autonomous driving
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Simple and scalable predictive uncertainty estimation using deep ensembles
Advances in Neural Information Processing Systems (NeurIPS)
Predictive uncertainty estimation via prior networks
Advances in Neural Information Processing Systems (NeurIPS)
Evidential deep learning to quantify classification uncertainty
Advances in Neural Information Processing Systems (NeurIPS)
Deep evidential regression
Advances in Neural Information Processing Systems (NeurIPS)
Cited by (176)
Explainable AI in human motion: A comprehensive approach to analysis, modeling, and generation
2024, Pattern RecognitionA noise suppression zeroing neural network for trajectory tracking with joint angle constraints of mobile manipulator
2024, Engineering Applications of Artificial IntelligenceDeep learning in fringe projection: A review
2024, NeurocomputingAnomaly diagnosis of connected autonomous vehicles: A survey
2024, Information FusionA systematic survey of air quality prediction based on deep learning
2024, Alexandria Engineering Journal
Xiao Bai received the B.Eng. degree in computer science from Beihang University, Beijing, China, in 2001, and the Ph.D. degree in computer science from the University of York, York, U.K., in 2006. He was a Research Officer (Fellow and Scientist) with the Computer Science Department, University of Bath, Bath, U.K., until 2008. He is currently a Full Professor with the School of Computer Science and Engineering, Beihang University. He has authored or coauthored more than 100 papers in journals and refereed conferences. His current research interests include pattern recognition, image processing, and remote sensing image analysis. He is an Associate Editor of Pattern Recognition and Signal Processing.
Xiang Wang received the B.S. degree in mathematics and applied mathematics from Beihang University, Beijing, China, in 2017, where he is currently pursuing the Ph.D. degree in computer science and engineering. His research interests include 3-D computer vision and deep learning.
Xianglong Liu received the B.S. and Ph.D. degrees in computer science from Beihang University, Beijing, China, in 2008 and 2014, respectively. From 2011 to 2012, he visited the Digital Video and Multimedia (DVMM) Lab, Columbia University as a joint Ph.D. student. He is currently an Associate Professor with the School of Computer Science and Engineering, Beihang University. He has published over 40 research papers at top venues like the IEEE Transactions on Image Processing, the IEEE Transactions on Cybernetics, Pattern Recognition, the Conference on Computer Vision and Pattern Recognition, the International Conference on Computer Vision, and the Association for the Advancement of Artificial Intelligence. His research interests include machine learning, computer vision and multimedia information retrieval.
Qiang Liu is currently an assistant professor with University of Texas at Austin. His research area is machine learning and statistics, with interests spreading over the pipeline of data collection (e.g., by crowdsourcing), learning, inference, decision making, and various applications using probabilistic modeling. He is an action editor of Journal of Machine Learning Research (JMLR).
Jingkuan Song is a professor with University of Electronic Science and Technology of China (UESTC). His research interest includes large-scale multimedia retrieval, image/video segmentation and image/video understanding using hashing, graph learning and deep learning techniques. He was the winner of the Best Paper Award in ICPR (2016, Mexico), Best Student Paper Award in Australian Database Conference (2017, Australia), and Best Paper Honorable Mention Award (2017, Japan). He is an Associate Editor of ACM TOMM, Guest Editor of TMM, WWWJ, PR, and he is/was AC/SPC/PC member of CVPR’18-21, MM’18-21, AAAI’18-21, etc.
Nicu Sebe received the Ph.D. degree from Leiden University, The Netherlands, in 2001. He is a Professor with the University of Trento, Trento, Italy, leading the research in the areas of multimedia information retrieval and human behavior understanding. Prof. Sebe was the General Co-Chair of the IEEE FG Conference 2008 and ACM Multimedia 2013, and the Program Chair of the International Conference on Image and Video Retrieval in 2007 and 2010, ACM Multimedia in 2007 and 2011, and ICCV 2017 and ECCV 2016. He was the General Chair of ACM ICMR 2017. He is a Fellow of IAPR.
Been Kim is currently a staff research scientist at Google Brain. She is interested in designing high-performance machine learning methods that make sense to humans. Her focus is building interpretability method for already-trained models or building inherently interpretable models.