ABSTRACT
While network binarization is a promising method in memory saving and speedup on hardware, it inevitably leads to binarization residual errors of intermediate features, resulting in performance capability degradation. To alleviate the above issue, we focus on the network architecture to design the more suitable network structure for the extreme-low bit scenario. In this paper, we propose the baseline-auxiliary network design method to compensate for the binarization residual of features via searching for auxiliary branches guided by feature similarity confidence score. The intermediate feature maps are reasonably enhanced by combining baseline and auxiliary features, mimicking the corresponding features of the full-precision network. In addition, we devised a novel diversity loss for the retraining process, which plays an important role in reducing information redundancy and expanding the diversity between auxiliary branches and binary networks. Extensive experiments show that our approach is superior in terms of accuracy and computational performance, and is plug-and-play for different network backbones and binarization policies.
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision (ECCV), 2016, pp. 525–542.Google ScholarCross Ref
- Matthieu Courbariaux and Yoshua Bengio, “Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1,” arXiv preprint arXiv:1602.02830, 2016.Google Scholar
- Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” in Advances in Neural Information Processing Systems (NeurIPS), 2015.Google Scholar
- Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng, “Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm,” in European Conference on Computer Vision (ECCV), 2018, pp. 722–737.Google ScholarDigital Library
- Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng, “Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm,” in European Conference on Computer Vision (ECCV), 2018, pp. 722–737.Google ScholarDigital Library
- Adrian Bulat, Brais Martinez, and Georgios Tzimiropoulos, “BATS: Binary architecture search,” in European Conference on Computer Vision (ECCV), 2020, pp. 309–325.Google ScholarDigital Library
- Baochang Zhang Li'an Zhuo, Hanlin Chen, Linlin Yang, Chen Chen, Yanjun Zhu, and David Doermann, “Cp-nas: Child- parent neural architecture search for 1-bit cnns,” in International Joint Conference on Artificial Intelligence (IJCAI), 2020.Google Scholar
- Barret Zoph and Quoc V. Le, “Neural architecture search with reinforcement learning,” in International Conference on Learning Representations (ICLR), 2017.Google Scholar
- Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu, “Practical block-wise neural network architecture generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2423– 2432.Google ScholarCross Ref
- Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean, “Efficient neural architecture search via parameter sharing,” arXiv preprint arXiv:1802.03268, 2018.Google Scholar
- Xiaofan Lin, Cong Zhao, and Wei Pan, “Towards accurate binary convolutional neural network,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.Google Scholar
- Lu Hou, Quanming Yao, and James T. Kwok, “Loss-aware binarization of deep networks,” in International Conference on Learning Representations (ICLR), 2017.Google Scholar
- Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, and Jingkuan Song, “Forward and backward information retention for accurate binary neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2250– 2259.Google ScholarCross Ref
- Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan, “Differentiable soft quantization: Bridging full-precision and low-bit neural networks,” in IEEE International Conference on Computer Vision (ICCV), 2019, pp. 4852–4861.Google ScholarCross Ref
- Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, and Ian Reid, “Structured binary neural networks for accurate image classification and semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 413–422.Google ScholarCross Ref
- Zixiang Ding, Yaran Chen, Nannan Li, Dongbin Zhao, Zhiquan Sun, and CL Philip Chen, “Bnas: Efficient neural architecture search using broad scalable architecture,” IEEE Transactions on Neural Networks and Learning Systems, 2021.Google Scholar
- Hanxiao Liu, Karen Simonyan, and Yiming Yang, “Darts: Differentiable architecture search,” in International Conference on Learning Representations (ICLR), 2017.Google Scholar
- Zhiqiang Shen, Zhankui He, and Xiangyang Xue, “Meal: Multi-model ensemble via adversarial learning,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2019, vol. 33, pp. 4886–4893.Google ScholarDigital Library
- Andrew Brock, Theodore Lim, J. M. Ritchie, and Nick Weston, “Smash: One-shot model architecture search through hyper networks,” arXiv preprint arXiv:1708.05344, 2017.Google Scholar
- Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun, “Single path one-shot neural architecture search with uniform sampling,” in European Conference on Computer Vision (ECCV), 2020, pp. 544– 560.Google ScholarDigital Library
- Adrian Bulat, Brais Martinez, and Georgios Tzimiropoulos, “High-capacity expert binary networks,” in International Conference on Learning Representations, 2020.Google Scholar
- Changlin Li, Jiefeng Peng, Liuchun Yuan, Guangrun Wang, Xiaodan Liang, and Liang Lin, “Blockwisely supervised neural architecture search with knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1989–1998.Google Scholar
- Ilya Loshchilov and Frank Hutter, “SGDR: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2017.Google Scholar
- Ruizhou Ding, Ting-Wu Chin, Zeye Liu, and Diana Marculescu, “Regularizing activation distribution for training binarized deep networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019, pp. 11408–11417.Google ScholarCross Ref
- Zechun Liu, Zhiqiang Shen, Marios Savvides, and Kwang- Ting Cheng, “Reactnet: Towards precise binary neural network with generalized activation functions,” in European Conference on Computer Vision (ECCV), 2020, pp. 143–159.Google ScholarDigital Library
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El- Yaniv, and Yoshua Bengio, “Binarized neural networks,” Advances in neural information processing systems (NeurIPS), vol. 29, 2016.Google Scholar
- Adrian Bulat and Georgios Tzimiropoulos, “Xnor- net++: Improved binary neural networks,” arXiv preprint arXiv:1909.13863, 2019.Google Scholar
- Jianming Ye, Jingdong Wang, and Shiliang Zhang, “Distillation-guided residual learning for binary convolutional neural networks,” IEEE Transactions on Neural Networks and Learning Systems, 2021.Google Scholar
- A. Krizhevsky, Learning multiple layers of features from tiny images, Tech. rep., University of Torento, Canada (2009).Google Scholar
- J. Deng, W. Dong, R. Socher, L. Li, Kai Li, Li Fei-Fei, Imagenet: A large-scale hierarchical image database, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp.3123–3131.Google ScholarCross Ref
- B. Zhuang, L. Liu, M. Tan, C. Shen, and I. Reid, "Training quantized neural networks with a full-precision auxiliary module," in Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, 2020, pp. 1488-1497.Google Scholar
Index Terms
- Baseline-auxiliary Network Architecture Design Scheme to Compensate for Binarization Residual Errors
Recommendations
AuxBranch: Binarization residual-aware network design via auxiliary branch search
Highlights- Network binarization inevitably leads to feature binarization residual.
- We ...
AbstractWhile network binarization is a promising method in memory saving and speedup on hardware, it inevitably leads to binarization residual of intermediate features, resulting in performance capability degradation. To alleviate the above ...
Automated Siamese Network Design for Image Similarity Computation
CBMI '23: Proceedings of the 20th International Conference on Content-based Multimedia IndexingDespite the success of Siamese networks in image indexing, face recognition, and signature verification, there has been little research on designing their architectural space compared to convolutional neural networks (CNNs). This work aims to automate ...
Performance prediction based on neural architecture features
Neural Architecture Search (NAS) usually requires to train quantities of candidate neural networks on a dataset for choosing a high‐performance network architecture and optimising hyperparameters, which is very time consuming and computationally ...
Comments