IIRNet: A lightweight deep neural network using intensely inverted residuals for image recognition

https://doi.org/10.1016/j.imavis.2019.10.005Get rights and content

Highlights

  • A lightweight and efficient convolutional neural network architecture is constructed.

  • Intensely inverted residual and multi-scale low-redundancy convolutions are used to reduce the model size and complexity.

  • The proposed network achieves comparable classification accuracy to the mainstream compact network architectures.

  • Balanced performance is obtained on three challenging datasets.

Abstract

Deep neural networks have achieved great success in many tasks of pattern recognition. However, large model size and high cost in computation limit their applications in resource-limited systems. In this paper, our focus is to design a lightweight and efficient convolutional neural network architecture by directly training the compact network for image recognition. To achieve a good balance among classification accuracy, model size, and computation complexity, we propose a lightweight convolutional neural network architecture named IIRNet for resource-limited systems. The new architecture is built based on Intensely Inverted Residual block (IIR block) to decrease the redundancy of the convolutional blocks. By utilizing two new operations, intensely inverted residual and multi-scale low-redundancy convolutions, IIR block greatly reduces its model size and computational costs while matches the classification accuracy of the state-of-the-art networks. Experiments on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate the superior performance of IIRNet on the trade-offs among classification accuracy, computation complexity, and model size, compared to the mainstream compact network architectures.

Introduction

Convolutional neural networks (CNNs) have been widely used in various applications of pattern recognition [[1], [2], [3], [4]] ever since AlexNet [5] won the championship of ImageNet Classification Challenge (2012 ILSVRC). From then on, many sophisticated designs of CNNs, e.g. VGGNet [6], GoogLeNet [7], and ResNet [8], were proposed and achieved great success in terms of classification accuracy by deepening the network on many computer vision datasets. However, deep convolutional neural networks with superhuman accuracy usually require high computational resources and large storage memory, which limit their usage in many resource-limited systems.

Besides the classification accuracy, the superior performances in terms of network model size and computation cost are also urgently critical for the applications of deep convolutional neural networks on resource-limited platforms. In recent years, many techniques were proposed to construct lightweight neural networks and achieved promising result [9,14,23,24]. An important approach to constructing lightweight neural networks usually designs a pretrained neural network with high classification accuracy without constraints on model size or computation complexity, and then compresses the pretrained network with techniques of low-rank decomposition [[9], [10], [11], [12], [13]], network pruning [[14], [15], [16], [17]], low bit quantization [[18], [19], [20]], or knowledge transferring [21,22]. Lebedev et al. presented a two-step method to speed up convolutional layers of CNNs based on tensor decomposition and discriminative fine-tuning [10]. Luo et al. decreased the model size of neural networks by utilizing the technique of filter level pruning [16]. Wang et al. proposed a two-step quantization method, which reduces both storage and computational complexity by quantizing the activation and the weights [19]. Chen et al. utilized the techniques of knowledge distillation and hint learning to learn compact and fast object detection networks [22].

Rather than constructing a complex network and then compacting the pretrained architecture to reduce the network redundancy, another kind of approaches commits to directly constructing a compact architecture. For example, Howard et al. proposed a lightweight CNN, MobileNet, which utilizes depthwise separable convolutions to construct compact network [23]. Sandler et al. designed a compact architecture, MobileNetV2, based on an inverted residual structure [24]. Since it constructs lightweight neural network directly, this type of approaches avoids the complicated training steps for complex networks, and is able to control the increase of the network parameters and operations. As this type of approaches is easier to implement, and has better expansibility, many works were proposed in recent years to design compact architectures directly [[25], [26], [27], [28], [29]].

This paper proposes a lightweight neural network architecture, which is directly designed in the construction step and especially suitable for resource-limited systems. The proposed network is composed of several stacked Intensely Inverted Residual (IIR) blocks. As shown in Fig. 1, an IIR block consists of multi-scale low-redundancy convolutions, structured sparse 1 × 1 convolutions (or called structured sparse point-wise convolutions), and intermediate multi-branch concatenation operation. IIR block first filters the low-dimensional compressed representation of input with multi-scale lightweight channel-wise kernels in six separate parallel branches, and then concatenates the branch outputs into features. Features are subsequently projected back to a low dimensional representation with structured sparse linear point-wise convolutions, which help the information flow across feature channels. Coupled with the technique of low redundancy filters, IIR block is further compressed to enable the network to be applied to resource-limited systems. The whole architecture of the proposed network is constructed by stacking multiple such blocks on top of one another, and named as IIRNet (Intensely Inverted Residual Network). Experimental results show that IIRNet achieves competitive performance on CIFAR-10, CIFAR-100 and ImageNet datasets in terms of accuracy, the number of operations, as well as the number of parameters, compared to the state-of-the-art compact architecture designs.

Section snippets

Related work

In order to implement high-accuracy CNN models onto resource-limited platforms, researchers are committed to pursuing a good balance among model size, computation complexity, and classification accuracy when designing model. Since the approach of direct constructing compact neural networks avoids the steps of training complex networks in advance, more and more recent works prefer to build compact architectures and train small networks directly. Iandola et al. proposed a directly designed

Intensely inverted residual network

Considering the trade-offs among accuracy, computation complexity, and the number of parameters, this paper proposes a new lightweight and efficient network architecture, IIRNet (Intensely Inverted Residual Network). The proposed network is built on an efficient block, IIR block, which is composed of multi-scale low-redundancy convolutions, structured sparse point-wise convolutions, and intermediate multi-branch concatenation operation. To design the compact network directly, we introduce the

CIFAR-10 and CIFAR-100

Experiments are performed on CIFAR datasets [35] to demonstrate the performance of IIRNet. CIFAR-10 and CIFAR-100 are labeled subsets of an 80 million tiny images dataset [36]. We evaluate the performance of IIRNet on CIFAR datasets, compared to Mobile networks, IGC networks, and other state-of-the-art compact models. We compare the performance from the viewpoints of the number of parameters, computation complexity, and classification accuracy.

Conclusion

In this paper, we focus on constructing a lightweight and efficient convolutional neural network architecture by directly training the compact network. We aim to decrease the redundancy in convolutional block and present an Intensely Inverted Residual block (IIR block) to construct a lightweight network, Intensely Inverted Residual Network (IIRNet). Focusing on the trade-offs between accuracy, and computation complexity, as well as the model size, we introduce intensely inverted residual

Acknowledgements

This work was supported by Guangzhou Municipal Science and Technology Project of China (201903010040), and the Science and Technology Planning Project of Guangdong Province of China (2019B070702004).

References (51)

  • X. Zhang et al.

    Accelerating very deep convolutional networks for classification and detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • V. Lebedev et al.

    Speeding-up convolutional neural networks using fine-tuned CP-decomposition

  • Y.D. Kim et al.

    Compression of deep convolutional neural networks for fast and low power mobile applications

  • A. Novikov et al.

    Tensorizing neural networks

  • W. Wang et al.

    Wide compression: tensor ring nets

  • M. Ren et al.

    SBNet: Sparse Blocks Network for fast inference

  • J. Yoon et al.

    Combined group and exclusive sparsity for deep neural networks

  • J. Luo et al.

    ThiNet: a filter level pruning method for deep neural network compression

  • Y. He et al.

    Channel pruning for accelerating very deep neural networks

  • C. Leng et al.

    Extremely low bit neural network: squeeze the last bit out with ADMM

  • P. Wang et al.

    Two-step quantization for low-bit neural networks

  • D. Alistarh et al.

    QSGD: communication-efficient SGD via gradient quantization and encoding

  • J. Yim et al.

    A gift from knowledge distillation: fast optimization, network minimization and transfer learning

  • G. Chen et al.

    Learning efficient object detection models with knowledge distillation

  • A.G. Howard et al.

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision

    (2017)
  • Cited by (0)

    This paper has been recommended for acceptance by Sinisa Todorovic.

    View full text