ABSTRACT
Modern convolutional neural networks (CNNs) incur huge computational and energy overheads. In this paper, we propose two techniques for inferring the confidence in the correctness of a prediction in the early layers of a CNN. The first technique uses a statistical approach, whereas the second technique requires retraining. We argue that prediction confidence estimation can enable diverse optimizations to CNNs. We demonstrate two optimizations. First, we predict selected images in early layers. This is possible because in a dataset, many images are easy to predict and they can be predicted in the early layers of a CNN. This reduces the average computation count at the cost of accuracy and parameter count. Second, we propose predicting only selected images for which the prediction-confidence is high. This reduces the coverage; however, the accuracy on the images that are predicted is higher. Our results with VGG16 and ResNet50 CNNs on the Caltech256 dataset show that our techniques are effective. For example, for ResNet, our first technique reduces the accuracy from 71.6% to 69.8% while reducing the computations by 14%. Similarly, with the second technique, on reducing the coverage from 100% to 90%, the accuracy is increased from 71.6% to 75.6%.
Keywords: computer vision, CNN, approximate computing, accuracy-coverage tradeoff, prediction confidence
- [n.d.]. AI and Compute. https://openai.com/blog/ai-and-compute/.Google Scholar
- Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, and Jiashi Feng. 2018. A2-Nets: Double Attention Networks. arXiv preprint arXiv:1810.11579(2018).Google Scholar
- Yonatan Geifman and Ran El-Yaniv. 2017. Selective classification for deep neural networks. arXiv preprint arXiv:1705.08500(2017).Google Scholar
- Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 object category dataset. (2007).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.Google Scholar
- Anitha Kannan, Jason Alan Fries, Eric Kramer, Jen Jen Chen, Nigam Shah, and Xavier Amatriain. 2020. The accuracy vs. coverage trade-off in patient-facing diagnosis models. AMIA Summits on Translational Science Proceedings (2020), 298.Google Scholar
- Sparsh Mittal. 2016. A Survey Of Techniques for Approximate Computing. Comput. Surveys 48, 4 (2016), 62:1–62:33.Google Scholar
- Sparsh Mittal. 2019. A Survey on Optimized Implementation of Deep Learning Models on the NVIDIA Jetson Platform. Journal of Systems Architecture 97 (2019), 428–442.Google ScholarDigital Library
- Sparsh Mittal. 2020. A Survey on Modeling and Improving Reliability of DNN Algorithms and Accelerators. Journal of Systems Architecture 104 (March 2020), 101689.Google ScholarDigital Library
- Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In DATE. 475–480.Google Scholar
- Poonam Rajput 2020. Improving Accuracy and Efficiency of Object Detection Algorithms using Multiscale Feature Aggregation Plugins. In IAPR TC3 Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR). 65–76.Google Scholar
- Rajat Saini 2020. ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks. In IEEE/CVF Winter Conference on Applications of Computer Vision. 1627–1636.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google Scholar
- Surat Teerapittayanon, Bradley McDanel, and Hsiang-Tsung Kung. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Intl. Conf. on Pattern Recognition (ICPR). IEEE, 2464–2469.Google ScholarCross Ref
- Yawen Wu, Zhepeng Wang, Zhenge Jia, Yiyu Shi, and Jingtong Hu. 2020. Intermittent inference with nonuniformly compressed multi-exit neural network for energy harvesting powered devices. In DAC. 1–6.Google Scholar
- Zhilu Zhang and Mert R Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. arXiv preprint arXiv:1805.07836(2018).Google Scholar
Recommendations
Confidence Estimation for Branch Prediction Reversal
HiPC '01: Proceedings of the 8th International Conference on High Performance ComputingBranch prediction reversal has been proved to be an effective alternative approach to dropping misprediction rates by means of adding a Confidence Estimator to a correlating branch predictor. This paper presents a Branch Prediction Reversal Unit (BPRU) ...
Prediction with confidence in item based collaborative filtering
PRICAI'16: Proceedings of the 14th Pacific Rim International Conference on Trends in Artificial IntelligenceRecommender systems can be viewed as prediction systems where we can predict the ratings which represent users' interest in the corresponding item. Typically, items having the highest predicted ratings will be recommended to the users. But users do not ...
On improving CNNs performance: The case of MNIST
Highlights- We check that CNNs accept performance improvement techniques in MNIST.
- These ...
AbstractIn this note, we follow two directions to improve the performance of CNN classifiers. The first is to apply to CNN units the same improvement techniques that we have successfully used with Stacked Denoising Auto-Encoder classifiers. ...
Comments