Abstract
In this work, we describe a novel and highly efficient convolutional neural network for image recognition, which we term the “Cross Connected Network” (CrossNet). We have creatively introduced the Pod structure, where the feature map of depthwise convolutions can be reused within the same Pod by cross connection. Such a design can make the CrossNet have high performance while less computing resource, especially suitable for mobile devices with very limited computing power. Additionally, we find that depthwise convolutions with large receptive field has better accuracy/computation trade-offs, and further improves CrossNet performance. Our experiments on ImageNet classification and MSCOCO object detection demonstrate that CrossNet can improve the state-of-the-art performance of lightweight networks (such as MobileNets-V1/-V2, ShuffleNets and CondenseNet). We have tested the actual inference time on an ARM-based mobile device. The CrossNet still gets the best performance. Code and models are public available (https://github.com/soeaver/CrossNet-PyTorch).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this paper, MFLOPs refers to the number of million multiplication-addition operations, MParams refers to the number of million parameters.
- 2.
- 3.
References
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional Nets, Atrous convolution, and fully connected CRFs. arXiv:1606.00915 (2016)
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. arXiv:1802.02611 (2018)
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: NIPS (2017)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR (2017)
Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
Gatys, L., Ecker, A., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:1706.02677 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38. arXiv:1603.05027
Hong, S., Roh, B., Kim, K., Cheon, Y., Park, M.: PVANet: lightweight deep neural networks for real-time object detection. arXiv:1611.08588 (2016)
Howard, A., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. In: CVPR (2017)
Huang, G., Liu, S., Maaten, L., Weinberger, K.: Condensenet: an efficient densenet using learned group convolutions. arXiv:1711.09224 (2017)
Huang, G., Liu, Z., Weinberger, K.: Densely connected convolutional networks. In: CVPR (2017)
Iandola, F., Han, S., Moskewicz, M., Ashraf, K., Dally, W., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv:1602.07360 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
Li, S., Xu, X., Nie, L., Chua, T.: Laplacian-steered neural style transfer. In: ACM MM (2017)
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. arXiv:1711.07767 (2017)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Matan, O., Burges, C., LeCun, Y., Denker, J.: Multi-digit recognition using a space displacement neural network. In: NIPS (1991)
Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameter sharing. arXiv:1802.03268 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV (2015)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv:1801.04381 (2018)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Wang, M., Liu, B., Foroosh, H.: Design of efficient convolutional layers using single intra-channel convolution, topological subdivisioning and spatial bottleneck structure. arXiv:1608.04337 (2016)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM MM (2016)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv:1707.01083 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Zhu, B., Chen, Y., Wang, J., Liu, S., Zhang, B., Tang, M.: Fast deep matting for portrait animation on mobile phone. In: ACM MM (2017)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.: Learning transferable architectures for scalable image recognition. arXiv:1707.07012 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, L., Song, Q., Li, Z., Wu, Y., Li, X., Hu, M. (2019). Cross Connected Network for Efficient Image Recognition. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-20887-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20886-8
Online ISBN: 978-3-030-20887-5
eBook Packages: Computer ScienceComputer Science (R0)