Elsevier

Pattern Recognition

Volume 109, January 2021, 107610
Pattern Recognition

Efficient densely connected convolutional neural networks

https://doi.org/10.1016/j.patcog.2020.107610Get rights and content

Highlights

  • Proposed two efficient densely connected ConvNets, DesneDsc and Dense2Net.

  • DesneDsc and Dense2Net are more efficient and have higher accuracy than DenseNet.

  • Dense2Net are state-of-the-art on ImageNet in manual CNNs with <10 M parameters.

  • DenseDsc and Dense2Net are very flexible and can be used in different applications.

Abstract

Recent works have shown that convolutional neural networks (CNNs) are parameter redundant, which limits the application of CNNs in Mobile devices with limited memory and computational resources. In this paper, two novel and efficient lightweight CNNs architectures are proposed, which are called DenseDsc and Dense2Net. Two proposed CNNs are densely connected and the dense connectivity facilitates feature re-use in the networks. Dense2Net adopts efficient group convolution and DenseDsc adopts more efficient depthwise separable convolution. The novel dense blocks of DenseDsc and Dense2Net improve the parameter efficiency. The proposed DenseDsc and Dense2Net are evaluated on highly competitive classification benchmark datasets (CIFAR and ImageNet). The experimental results show that DenseDsc and Dense2Net have higher accuracy than DenseNet with similar parameters or FLOPs. Compared with other efficient CNNs with less than 0.5 M parameters for CIFAR, Dense2Net and DenseDsc achieve state-of-the-art results on CIFAR-10 and CIFAR-100, respectively. DenseDsc and Dense2Net are very competitive in efficient CNNs with less than 1.0 M parameters on CIFAR. Furthermore, Dense2Net achieves state-of-the-art results on ImageNet in manual CNNs with less than 10 M parameters.

Introduction

In recent years, deep neural networks (DNNs) have achieved great success in many fields [1], [2], [3], [4]. For instance, H-LSTCM [5] and CCG-LSTM [6] achieved the state-of-the-art result for collective activity recognition. HGDNs [7] can customize a suitable scale for each pixel, which shows competitive results on the semantic segmentation task. ResNet [8] can surpass the human level in image classification tasks. Recurrent Neural Networks (RNN) have outstanding performance in modeling the sequential data. SC-RNN [9] can simultaneously capture the spatial coherence among joints and the temporal evolution among skeletons on a co-attention feature map, which can efficiently predict human motion. However, DNNs also have some problems to be addressed. For example, DNNs are prone to overfitting under the condition of lacking sufficient training data. Qi [10] proposed the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN), which had an outstanding performance on image classification tasks by semi-supervised learning when the labeled data is very limited. Generalized deep transfer networks (DTNs) were proposed, which can adequately mitigate the problem of insufficient training images by bringing in rich labels from the textual domain [11], [12].

In DNNs, the parameter redundancy causes overfitting and resource consumption, which is more serious in convolutional neural networks (CNNs). Inceptive, to obtain high accuracy, the CNNs [13], [14], such as the winners of ImageNet Large Scale Visual Recognition Competition (ILSVRC) [15] become very deep and have a great deal of channels. VGG-16 [13] has 13 convolutional layers and 3 fully connected layers with 138 million parameters. ResNet-100 [14] and DenseNet-121 [16] have 25 million and 8.1 million parameters, respectively. Xie et al proposed filter-in-filter scheme to enhance the expressibility of a filter [17]. A CNN that promotes competition of multiple-size filters was proposed to improve the accuracy [18]. These CNNs with high accuracy and abundant parameters are designed for servers that have abundant computational resources. Wu et al. try to find a good compromise between the depth and width and propose a group of relatively shallow CNNs. In consequence, these models cannot be used to perform real-time inference on low-compute devices. Deep CNNs have a wide range of applications. Sometimes, CNNs need to be deployed on some low-compute and low-power mobile devices, such as self-driving cars, smartphones, and robotics. Therefore, the CNNs with small model size, little parameters, and low computation cost, but still high accuracy become an urgent request.

To reduce the parameters and FLOPs in CNNs, some approaches for designing efficient CNNs have been proposed, e.g., pruning [19], [20], quantization [21], [22] and more efficient network architectures [23], [24]. Compared with VGG [13], ResNet [14] reduced the computational cost by a factor 5 × , DenseNet [16] by a factor of 10 × , but obtained higher accuracies on ImageNet. The ResNet and DenseNet used 1 × 1 convolution kernels to reduce parameters and computational cost. Furthermore, the ResNet addressed vanishing gradient problem by shortcut connection, and the DenseNet induced feature reuse by directly each layer with all layers before it. However, in ResNet and DenseNet, the standard 3 × 3 and 1 × 1 convolutions have a good deal of parameters and computational cost. Subsequently, the more efficient MobileNet [25] reduced approximately 25 ×  computational cost with competitive accuracies by using efficient depthwise separable convolution.

In this paper, to design more efficient CNN, we propose two novel and efficient architectures DenseDsc and Dense2Net which are stacks of dense blocks. In DenseDsc, DenseDsc block of the dense block consists of two convolution layers. The first layer is the efficient depthwise convolution layer. There are two parallel depthwise convolution layers. The second layer is the 1 × 1 group convolution layer, which can fuse information efficiently. In Dense2Net, a Dense2Net block of the dense block consists of a 3 × 3 group convolution layer for extracting features and a 1 × 1 convolution for fusing information and reducing channels. It is well known that both depthwise and group convolutions are more efficient than standard convolution. The proposed DenseDsc and Dense2Net can induce feature reuse by dense connectives and keep a lightweight computation by efficient convolution methods. We define some hyperparameters for proposed DenseDsc and Dense2Net, which can easily adjust the model size and help DenseDsc and Dense2Net can be used in different cases.

We evaluate DenseDsc and Dense2Net on two highly competitive benchmark datasets, CIFAR and ImageNet. For CIFAR, the experimental results show that Dense2Net and DenseDsc achieve state-of-the-art results on CIFAR-10 and CIFAR-100, respectively. Compared with efficient CNNs with less than 1.0 M parameters for CIFAR, the accuracies of two proposed CNNs are very competitive. For ImageNet, Dense2Net achieves the state-of-the-art results in the manual CNNs with less than 10 M parameters.

Our main contributions are summarized as follows.

  • We propose a novel building block called DenseDsc block, which is very efficient by using depthwise separable convolution.

  • We introduce an efficient and novel densely connected CNN called DenseDsc by stacking dense block constructed by DenseDsc block, which achieves state-of-the-art results on CIFAR-100 in CNNs with less than 0.5 M parameters.

  • We propose an efficient and novel Dense2Net block by using channel shuffle and group convolution operations, which improves the parameter efficiency and the information transfer.

  • We present an efficient densely connected CNN Dense2Net, which achieves state-of-the-art results on CIFAR-10 in CNNs with less than 0.5 M parameter. The accuracy of Dense2Net on ImageNet is higher than all manual CNNs with less than 10 M parameters.

  • We study the effect of growth rate k on the accuracy, as the k increase, and the accuracies of DenseDsc and Dense2Net are increased obviously.

The remainder of this paper is organized as follows. In Section 2, some related works are presented. The details of Dense2Net and DenseDsc are introduced in Section 3. The experiments results are discussed in Section 4 and we give the conclusion in Section 5.

Section snippets

Related work and background

In this section, a brief introduction about improving convolution efficiency is presented and the DenseNet [16] is reviewed in detail. Besides these kinds of approaches, there are some other approaches for efficient CNNs architecture, such as knowledge distillation [26] and neural architecture search networks [27], which are not explored in this paper.

Proposed method

In this section, the proposed two efficient architectures DenseDsc and Dense2Net are introduced in details. And then the parameter efficiency is analyzed. The tendency of FLOPs is similar with that of parameter.

Experiments and analysis

In this section, we use CIFAR and ImageNet datasets to evaluate the performance of DenseDsc and Dense2Net. We mainly focus on efficiency and accuracy.

Conclusion

In this paper, we propose two efficient densely connected convolutional neural networks which are called DenseDsc and Dense2Net, respectively. The two networks take advantage of the dense connection to improve feature reuse and use efficient convolutions for improving efficiency. In DenseDsc, the efficient depthwise separable convolution is used to improve the efficiency. In Dense2Net, we use group convolution to improve the parameter efficiency. And the multi-levels group convolutions can

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research work was partly supported by the National Natural Science Foundation of China (Project No. 61750110529, 61850410535), the Natural Science Foundation of Jiangsu Province (Project No. BK20161147), the Postgraduate Research and Practice Innovation Program of Jiangsu Province (SJCX18_0058).

Guoqing Li received the B.S. degree from Qingdao University, Qingdao, China, in 2014, the M.S. degree from South China Normal University, Guangzhou, China, in 2017. He is currently pursuing the Ph.D. degree with the National ASIC Engineering Technology Research Center, School of Electronics Science and Engineering, Southeast University, Nanjing, China. His current research interests include computer vision, convolutional neural network, deep learning hardware accelerator.

References (43)

  • X. Shu et al.

    Hierarchical long short-term concurrent memory for human interaction recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2019)
  • J. Tang et al.

    Coherence constrained graph lstm for group activity recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2019)
  • G.-J. Qi

    Hierarchically gated deep networks for semantic segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • K. He et al.

    Identity mappings in deep residual networks

    European Conference on Computer Vision

    (2016)
  • X. Shu, L. Zhang, G.-J. Qi, W. Liu, J. Tang, Spatiotemporal co-attention recurrent neural networks for human-skeleton...
  • G. Qi

    Loss-sensitive generative adversarial networks on lipschitz densities

    Int. J. Comput. Vis.

    (2020)
  • X. Shu et al.

    Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation

    Proceedings of the 23rd ACM international conference on Multimedia

    (2015)
  • J. Tang et al.

    Generalized deep transfer networks for knowledge propagation in heterogeneous domains

    ACM Trans. Multimed. Comput. Commun. Applica. (TOMM)

    (2016)
  • K. Simonyan et al.

    Very deep convolutional networks for large-scale image recognition

    3rd International Conference on Learning Representations

    (2015)
  • K. He et al.

    Deep residual learning for image recognition

    IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • O. Russakovsky et al.

    Imagenet large scale visual recognition challenge

    Int. J. Comput. Vis.

    (2015)
  • Cited by (73)

    • DGFaceNet: Lightweight and efficient face recognition

      2023, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus

    Guoqing Li received the B.S. degree from Qingdao University, Qingdao, China, in 2014, the M.S. degree from South China Normal University, Guangzhou, China, in 2017. He is currently pursuing the Ph.D. degree with the National ASIC Engineering Technology Research Center, School of Electronics Science and Engineering, Southeast University, Nanjing, China. His current research interests include computer vision, convolutional neural network, deep learning hardware accelerator.

    Meng Zhang received the B.S. degree in Electrical Engineering from the China University of Mining and Technology, Xuzhou, China, in 1986, and the M.S. and Ph.D. degrees in Bioelectronics Engineering from Southeast University, Nanjing, China, in 1993 and 2014, respectively. He is currently a professor in National ASIC System Research Center, College of Electronic Science and Engineering of Southeast University, Nanjing, PR China. He is a faculty adviser of Ph.D. graduates. His research interests include digital signal and image processing, digital communication systems, wireless sensor networks, information security and assurance, cryptography, and digital integrated circuit design, machine learning etc. He is an author or coauthor of more than 50 referred journal and international conference papers and a holder of more than 60 patents, including some PCT, US patents.

    Jiaojie Li is a M.S. student at National ASIC Center in School of Microelectronics, Southeast University, China. He received the B.S. degree in Dalian University of Technology, Dalian, China, in 2018. His research interests include digital integrated circuit design, deep learning techniques etc.

    Feng Lv is a M.S. student at National ASIC Center in School of Microelectronics, Southeast University, China. He received the B.S. degree in Southeast University, NanJing, China, in 2018. His research interests include computer version, pattern recognition.

    Guodong Tong received the B.S and M.S degrees from the Yangzhou University of physical science and technology, Yangzhou, China, in 2013 and 2018. Respectively, He worked for Huawei & China Mobile one year. He is currently pursuing the Ph.D. degree with the School of Electronic Science & Engineering, National ASIC System Engineer Technology Research Center, Southeast University, Nanjing, China. His research interests include circuit timing analysis, AI, GCN and GCN processes network tables. He is a student member of the IEEE Advancing Technology for Humanity.

    View full text