Skip to main content

Cross Connected Network for Efficient Image Recognition

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 (ACCV 2018)

Abstract

In this work, we describe a novel and highly efficient convolutional neural network for image recognition, which we term the “Cross Connected Network” (CrossNet). We have creatively introduced the Pod structure, where the feature map of depthwise convolutions can be reused within the same Pod by cross connection. Such a design can make the CrossNet have high performance while less computing resource, especially suitable for mobile devices with very limited computing power. Additionally, we find that depthwise convolutions with large receptive field has better accuracy/computation trade-offs, and further improves CrossNet performance. Our experiments on ImageNet classification and MSCOCO object detection demonstrate that CrossNet can improve the state-of-the-art performance of lightweight networks (such as MobileNets-V1/-V2, ShuffleNets and CondenseNet). We have tested the actual inference time on an ARM-based mobile device. The CrossNet still gets the best performance. Code and models are public available (https://github.com/soeaver/CrossNet-PyTorch).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this paper, MFLOPs refers to the number of million multiplication-addition operations, MParams refers to the number of million parameters.

  2. 2.

    https://pytorch.org/.

  3. 3.

    https://github.com/Tencent/ncnn.

References

  1. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional Nets, Atrous convolution, and fully connected CRFs. arXiv:1606.00915 (2016)

  2. Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)

  3. Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. arXiv:1802.02611 (2018)

    Chapter  Google Scholar 

  4. Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: NIPS (2017)

    Google Scholar 

  5. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR (2017)

    Google Scholar 

  6. Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)

    Google Scholar 

  7. Gatys, L., Ecker, A., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)

    Google Scholar 

  8. Girshick, R.: Fast R-CNN. In: ICCV (2015)

    Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  10. Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:1706.02677 (2017)

  11. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38. arXiv:1603.05027

    Chapter  Google Scholar 

  14. Hong, S., Roh, B., Kim, K., Cheon, Y., Park, M.: PVANet: lightweight deep neural networks for real-time object detection. arXiv:1611.08588 (2016)

  15. Howard, A., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. In: CVPR (2017)

    Google Scholar 

  16. Huang, G., Liu, S., Maaten, L., Weinberger, K.: Condensenet: an efficient densenet using learned group convolutions. arXiv:1711.09224 (2017)

  17. Huang, G., Liu, Z., Weinberger, K.: Densely connected convolutional networks. In: CVPR (2017)

    Google Scholar 

  18. Iandola, F., Han, S., Moskewicz, M., Ashraf, K., Dally, W., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv:1602.07360 (2016)

  19. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  20. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)

    Article  Google Scholar 

  21. Li, S., Xu, X., Nie, L., Chua, T.: Laplacian-steered neural style transfer. In: ACM MM (2017)

    Google Scholar 

  22. Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)

    Google Scholar 

  23. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  24. Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. arXiv:1711.07767 (2017)

  25. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  26. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  27. Matan, O., Burges, C., LeCun, Y., Denker, J.: Multi-digit recognition using a space displacement neural network. In: NIPS (1991)

    Google Scholar 

  28. Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)

    Google Scholar 

  29. Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameter sharing. arXiv:1802.03268 (2018)

  30. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)

    Google Scholar 

  31. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  32. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV (2015)

    Google Scholar 

  33. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv:1801.04381 (2018)

  34. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  35. Wang, M., Liu, B., Foroosh, H.: Design of efficient convolutional layers using single intra-channel convolution, topological subdivisioning and spatial bottleneck structure. arXiv:1608.04337 (2016)

  36. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)

    Google Scholar 

  37. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM MM (2016)

    Google Scholar 

  38. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)

    Google Scholar 

  39. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv:1707.01083 (2017)

  40. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

  41. Zhu, B., Chen, Y., Wang, J., Liu, S., Zhang, B., Tang, M.: Fast deep matting for portrait animation on mobile phone. In: ACM MM (2017)

    Google Scholar 

  42. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.: Learning transferable architectures for scalable image recognition. arXiv:1707.07012 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lu Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, L., Song, Q., Li, Z., Wu, Y., Li, X., Hu, M. (2019). Cross Connected Network for Efficient Image Recognition. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11361. Springer, Cham. https://doi.org/10.1007/978-3-030-20887-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20887-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20886-8

  • Online ISBN: 978-3-030-20887-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics