Elsevier

Pattern Recognition

Volume 120, December 2021, 108102
Pattern Recognition

Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments

https://doi.org/10.1016/j.patcog.2021.108102Get rights and content

Highlights

  • A detailed survey of explainable deep learning for efficient and robust pattern recognition is represented.

  • Explainable methods for deep neural networks, including visualization and uncertainty estimation, are categorized and presented.

  • Model compression and acceleration methods for efficient deep learning are reviewed.

  • Two major topics related to robust deep learning, adversarial robustness and stability in training neural networks, are covered.

  • The accepted papers for the special issue on explainable deep learning for efficient and robust pattern recognition show the recent advances and promote further researches.

Abstract

Deep learning has recently achieved great success in many visual recognition tasks. However, the deep neural networks (DNNs) are often perceived as black-boxes, making their decision less understandable to humans and prohibiting their usage in safety-critical applications. This guest editorial introduces the thirty papers accepted for the Special Issue on Explainable Deep Learning for Efficient and Robust Pattern Recognition. They are grouped into three main categories: explainable deep learning methods, efficient deep learning via model compression and acceleration, as well as robustness and stability in deep learning. For each of the three topics, a survey of the representative works and latest developments is presented, followed by the brief introduction of the accepted papers belonging to this topic. The special issue should be of high relevance to the reader interested in explainable deep learning methods for efficient and robust pattern recognition applications and it helps promoting the future research directions in this field.

Introduction

Deep neural networks (DNNs) have recently achieved outstanding predictive performance and have become an indispensable tool in a wide range of pattern recognition applications (e.g., image classification, object detection, video understanding, document analysis, etc.). Though showing impressively high predictive accuracy, DNNs are often perceived as black-box models with deep, computationally expensive layers and have been recently found vulnerable to spoofing with well-designed input samples in many safety-critical applications. Extensive clues have been found in several sensitive or real-time pattern recognition applications, including medical diagnosis, face recognition, and self-driving cars. In these scenarios, the cost of a simple prediction error could be significantly high, and thus the reliance on the trained model and its capability to deliver both efficient and robust data processing must be guaranteed. Therefore, understanding the behaviors of DNNs, gaining insights into their working mechanisms, and further generating explainable deep learning models have become essential and fundamental problems.

Though DNNs have shown remarkable progress in a broad spectrum of applications, it is, unfortunately, unclear what information must be present in the input data and how it must be used in deep learning models to guarantee a fast, safe and stable prediction. Recently there has been an explosion of interest along related research directions, such as (a) interpreting the learned representations and generated decisions or quantifying the reliability of the decisions, (b) analyzing the computational bottleneck in network architectures for efficient learning, and (c) regularizing the network structure or designing specific training schemes for stable and robust prediction. These encouraging signs of progress bring profound implications on the research into the topic of Explainable Deep Learning for Efficient and Robust Pattern Recognition. This essential and open research topic brings new challenges and opportunities to the pattern recognition community. Tremendous efforts are required for these applications to uncover the fundamental mechanisms from several different points of view, including information theory, machine learning, computer vision, information security, etc. Moreover, it also potentially benefits a variety of closely related areas in pattern recognition areas, and opens up the possibility of practical safety-critical or low-cost applications. This special issue aims to provide a forum for researchers and practitioners in the broad deep learning and pattern recognition community to present their novel and original research for explainable deep learning.

We have received 58 high-quality submissions and accepted 30 papers after a rigorous review process. The accepted papers cover some key tasks in pattern recognition, such as image classification, recognition, clustering, semantic segmentation, object detection, zero-shot learning, and domain adaptation, which all involve explainable deep learning for efficient and robust pattern recognition. These papers can be roughly divided into three major categories: (1) quantifying or visualizing the interpretability of deep neural networks for explainable deep learning, (2) structure optimization of neural networks in pattern recognition applications for efficient deep learning, and (3) adversarial attacks and defending critical applications in pattern recognition and stability improvements of neural network optimization for robust deep learning.

In this guest editorial, we briefly review representative works and recent advances of explainable deep learning for efficient and robust pattern recognition in a structured framework, followed by introducing the accepted papers for this special issue. They are grouped into three categories, i.e., explainable, efficient and robust deep learning. The overall structure of this editorial is shown in Fig. 1. We hope that future developments of explainable, efficient and robust deep learning in pattern recognition applications could be promoted by the survey of recent developments and the contributed papers in this special issue.

Section snippets

Explainable deep learning methods

There are plenty of approaches that aim at providing explanations to the working mechanism of deep neural networks. A majority of explanation methods focus on attributing the prediction of DNN to its input features [1]. Such attribution-based methods cover most visualization methods in computer vision, which give explanation directly in the domain of input images by localizing regions that contribute most to the decision. Besides, many other non-attribution-based methods provide explanations in

Efficient deep learning via model compression and acceleration

Deep neural networks have achieved the highest precision in various tasks at the expense of many parameters, requiring significant computational resources and training time. Thus there is a huge demand for model compression and acceleration techniques before deploying to resource-constrained devices and real-time applications. In recent years, a growing number of methods have been presented for compressing and accelerating the network while making the slightest compromise with the model

Robustness and stability in deep learning

Robustness reveals the model’s ability to provide reliable decisions under data noise. In recent years, several aspects of robustness related to deep learning have been studied. The hottest topic among them is adversarial robustness, as it closely correlates to the safety issue in the application. Stability is another critical issue of deep neural networks that determines whether a network converges successfully. Several techniques can help deal with training stability, such as normalization

Conclusion

This guest editorial provides a comprehensive review of the representative works and recent developments in explainable deep learning for efficient and robust pattern recognition and introduces the thirty accepted papers for this special issue. The accepted papers are of high quality, providing the most recent advances in improving the interpretability of deep learning methods, designing compact and efficient network architectures in specific pattern recognition problems, designing novel

Acknowledgments

This work was supported by the National Natural Science Foundation of China project no. 61772057, Beijing Natural Science Foundation (4202039), the support funding Jiangxi Research Institute of Beihang University.

Xiao Bai received the B.Eng. degree in computer science from Beihang University, Beijing, China, in 2001, and the Ph.D. degree in computer science from the University of York, York, U.K., in 2006. He was a Research Officer (Fellow and Scientist) with the Computer Science Department, University of Bath, Bath, U.K., until 2008. He is currently a Full Professor with the School of Computer Science and Engineering, Beihang University. He has authored or coauthored more than 100 papers in journals

References (133)

  • J. Liang et al.

    Exploring uncertainty in pseudo-label guided unsupervised domain adaptation

    Pattern Recognit.

    (2019)
  • B.N. Patro et al.

    Probabilistic framework for solving visual dialog

    Pattern Recognit.

    (2021)
  • W. Wu et al.

    Deep features for person re-identification on metric learning

    Pattern Recognit.

    (2021)
  • Y. He et al.

    Towards non-IID image classification: a dataset and baselines

    Pattern Recognit.

    (2021)
  • C. Kaplan et al.

    Goal driven network pruning for object recognition

    Pattern Recognit.

    (2021)
  • D. Wan et al.

    Deep quantization generative networks

    Pattern Recognit.

    (2020)
  • H. Qin et al.

    Binary neural networks: a survey

    Pattern Recognit.

    (2020)
  • L. Gao et al.

    Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis

    Pattern Recognit.

    (2021)
  • M. Oršić et al.

    Efficient semantic segmentation with pyramidal fusion

    Pattern Recognit.

    (2021)
  • J. Yuan et al.

    Gated CNN: integrating multi-scale feature layers for object detection

    Pattern Recognit.

    (2020)
  • X. Zhen et al.

    Heterogenous output regression network for direct face alignment

    Pattern Recognit.

    (2020)
  • T. Li et al.

    Learning residual refinement network with semantic context representation for real-time saliency object detection

    Pattern Recognit.

    (2020)
  • C. Santiago et al.

    Low: training deep neural networks by learning optimal sample weights

    Pattern Recognit.

    (2021)
  • M. Sundararajan et al.

    Axiomatic attribution for deep networks

    International Conference on Machine Learning (ICML), PMLR

    (2017)
  • B. Zhou et al.

    Learning deep features for discriminative localization

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • K. Simonyan et al.

    Deep inside convolutional networks: visualising image classification models and saliency maps

    International Conference on Learning Representations (ICLR Workshop Track)

    (2014)
  • J. Springenberg et al.

    Striving for simplicity: the all convolutional net

    International Conference on Learning Representations (ICLR Workshop Track)

    (2015)
  • R.R. Selvaraju et al.

    Grad-CAM: visual explanations from deep networks via gradient-based localization

    Int. J. Comput. Vis.

    (2020)
  • A. Shrikumar, P. Greenside, A. Shcherbina, A. Kundaje, Not just a black box: learning important features through...
  • S. Bach et al.

    On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation

    PLoS One

    (2015)
  • A. Shrikumar et al.

    Learning important features through propagating activation differences

    International Conference on Machine Learning (ICML), PMLR

    (2017)
  • M. Ancona et al.

    Towards better understanding of gradient-based attribution methods for deep neural networks

    International Conference on Learning Representations (ICLR)

    (2018)
  • R.C. Fong et al.

    Interpretable explanations of black boxes by meaningful perturbation

    Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    (2017)
  • J. Wagner et al.

    Interpretable and fine-grained visual explanations for convolutional neural networks

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2019)
  • V. Petsiuk et al.

    Rise: randomized input sampling for explanation of black-box models

    British Machine Vision Conference (BMVC)

    (2018)
  • H.G. Ramaswamy

    Ablation-CAM: visual explanations for deep convolutional network via gradient-free localization

    Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV)

    (2020)
  • H. Wang et al.

    Score-CAM: score-weighted visual explanations for convolutional neural networks

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    (2020)
  • S.M. Lundberg et al.

    A unified approach to interpreting model predictions

    Advances in Neural Information Processing Systems (NeurIPS)

    (2017)
  • M.T. Ribeiro et al.

    “Why should I trust you?” Explaining the predictions of any classifier

    Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2016)
  • B. Kim et al.

    Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV)

    International Conference on Machine Learning (ICML), PMLR

    (2018)
  • B. Kim et al.

    Examples are not enough, learn to criticize! criticism for interpretability

    Advances in Neural Information Processing Systems (NeurIPS)

    (2016)
  • S. Wachter et al.

    Counterfactual explanations without opening the black box: automated decisions and the GDPR

    Harv. J. Law Tech.

    (2017)
  • A. Kendall et al.

    What uncertainties do we need in Bayesian deep learning for computer vision?

    Advances in Neural Information Processing Systems (NeurIPS)

    (2017)
  • Y. Gal et al.

    Dropout as a Bayesian approximation: representing model uncertainty in deep learning

    International Conference on Machine Learning (ICML), PMLR

    (2016)
  • Y. He et al.

    Bounding box regression with uncertainty for accurate object detection

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2019)
  • J. Choi et al.

    Gaussian YOLOv3: an accurate and fast object detector using localization uncertainty for autonomous driving

    Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    (2019)
  • B. Lakshminarayanan et al.

    Simple and scalable predictive uncertainty estimation using deep ensembles

    Advances in Neural Information Processing Systems (NeurIPS)

    (2017)
  • A. Malinin et al.

    Predictive uncertainty estimation via prior networks

    Advances in Neural Information Processing Systems (NeurIPS)

    (2018)
  • M. Sensoy et al.

    Evidential deep learning to quantify classification uncertainty

    Advances in Neural Information Processing Systems (NeurIPS)

    (2018)
  • A. Amini et al.

    Deep evidential regression

    Advances in Neural Information Processing Systems (NeurIPS)

    (2020)
  • Cited by (176)

    View all citing articles on Scopus

    Xiao Bai received the B.Eng. degree in computer science from Beihang University, Beijing, China, in 2001, and the Ph.D. degree in computer science from the University of York, York, U.K., in 2006. He was a Research Officer (Fellow and Scientist) with the Computer Science Department, University of Bath, Bath, U.K., until 2008. He is currently a Full Professor with the School of Computer Science and Engineering, Beihang University. He has authored or coauthored more than 100 papers in journals and refereed conferences. His current research interests include pattern recognition, image processing, and remote sensing image analysis. He is an Associate Editor of Pattern Recognition and Signal Processing.

    Xiang Wang received the B.S. degree in mathematics and applied mathematics from Beihang University, Beijing, China, in 2017, where he is currently pursuing the Ph.D. degree in computer science and engineering. His research interests include 3-D computer vision and deep learning.

    Xianglong Liu received the B.S. and Ph.D. degrees in computer science from Beihang University, Beijing, China, in 2008 and 2014, respectively. From 2011 to 2012, he visited the Digital Video and Multimedia (DVMM) Lab, Columbia University as a joint Ph.D. student. He is currently an Associate Professor with the School of Computer Science and Engineering, Beihang University. He has published over 40 research papers at top venues like the IEEE Transactions on Image Processing, the IEEE Transactions on Cybernetics, Pattern Recognition, the Conference on Computer Vision and Pattern Recognition, the International Conference on Computer Vision, and the Association for the Advancement of Artificial Intelligence. His research interests include machine learning, computer vision and multimedia information retrieval.

    Qiang Liu is currently an assistant professor with University of Texas at Austin. His research area is machine learning and statistics, with interests spreading over the pipeline of data collection (e.g., by crowdsourcing), learning, inference, decision making, and various applications using probabilistic modeling. He is an action editor of Journal of Machine Learning Research (JMLR).

    Jingkuan Song is a professor with University of Electronic Science and Technology of China (UESTC). His research interest includes large-scale multimedia retrieval, image/video segmentation and image/video understanding using hashing, graph learning and deep learning techniques. He was the winner of the Best Paper Award in ICPR (2016, Mexico), Best Student Paper Award in Australian Database Conference (2017, Australia), and Best Paper Honorable Mention Award (2017, Japan). He is an Associate Editor of ACM TOMM, Guest Editor of TMM, WWWJ, PR, and he is/was AC/SPC/PC member of CVPR’18-21, MM’18-21, AAAI’18-21, etc.

    Nicu Sebe received the Ph.D. degree from Leiden University, The Netherlands, in 2001. He is a Professor with the University of Trento, Trento, Italy, leading the research in the areas of multimedia information retrieval and human behavior understanding. Prof. Sebe was the General Co-Chair of the IEEE FG Conference 2008 and ACM Multimedia 2013, and the Program Chair of the International Conference on Image and Video Retrieval in 2007 and 2010, ACM Multimedia in 2007 and 2011, and ICCV 2017 and ECCV 2016. He was the General Chair of ACM ICMR 2017. He is a Fellow of IAPR.

    Been Kim is currently a staff research scientist at Google Brain. She is interested in designing high-performance machine learning methods that make sense to humans. Her focus is building interpretability method for already-trained models or building inherently interpretable models.

    View full text