Skip to main content
Log in

DLA+: A Light Aggregation Network for Object Classification and Detection

  • Research Article
  • Pattern Recognition
  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

Abstract

An efficient convolution neural network (CNN) plays a crucial role in various visual tasks like object classification or detection, etc. The most common way to construct a CNN is stacking the same convolution block or complex connection. These approaches may be efficient but the parameter size and computation (Comp) have explosive growth. So we present a novel architecture called “DLA+”, which could obtain the feature from the different stages, and by the newly designed convolution block, could achieve better accuracy, while also dropping the computation six times compared to the baseline. We design some experiments about classification and object detection. On the CIFAR10 and VOC data-sets, we get better precision and faster speed than other architecture. The lightweight network even allows us to deploy to some low-performance device like drone, laptop, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Q. Jia, K. M. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. [Online], Available: https://arxiv.org/abs/1706.02677,2017.

  2. X. H. Ding, Y. C. Guo, G. G. Ding, J. G. Han. ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 1911–1920, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00200.

    Google Scholar 

  3. A. Borja, A. B. Josefson, A. Miles, I. Muxika, F. Olsgard, G. Phillips, J. G. Rodríguez, B. Rygg. An approach to the intercalibration of benthic ecological status assessment in the North Atlantic ecoregion, according to the European Water Framework Directive. Marine Pollution Bulletin, vol. 55, no. 1–6, pp. 42–52, 2007. DOI: https://doi.org/10.1016/j.marpolbul.2006.08.018.

    Article  Google Scholar 

  4. S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ACM, Lille, France, pp. 448–456, 2015.

    Google Scholar 

  5. H. Law, J. Deng. CornerNet: Detecting objects as paired keypoints. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 734–750, 2018. DOI: https://doi.org/10.1007/978-3-030-01264-9_45.

    Google Scholar 

  6. F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, K. Keutzer. DenseNet: Implementing efficient ConvNet descriptor pyramids. [Online], Available: https://arxiv.org/abs/1404.1869, 2014.

  7. B. Hu, J. C. Wang. Deep learning based hand gesture recognition and UAV flight controls. International Journal of Automation and Computing, vol. 17, no. 1, pp. 17–29, 2020. DOI: https://doi.org/10.1007/s11633-019-1194-7.

    Article  Google Scholar 

  8. K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.

    Google Scholar 

  9. B. Xu, N. Y. Wang, T. Q. Chen, M. Li. Empirical evaluation of rectified activations in convolutional network. [Online], Available: https://arxiv.org/abs/1505.00853, 2015.

  10. C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.

    Book  Google Scholar 

  11. Y. Lecun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. DOI: https://doi.org/10.1109/5.726791.

    Google Scholar 

  12. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, F. F. Li. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, vol.115, no. 3, pp. 211–252, 2015. DOI: https://doi.org/10.1007/sll263-015-0816-y.

    Google Scholar 

  13. S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.

    Article  Google Scholar 

  14. T.-Y. Lin, P. Dollar, R. Girshick, K. M. He, B. Hariharan, S. Belongie. Feature pyramid networks for object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2117–2125, 2017. DOI: https://doi.org/10.1109/CVPR.2017.106.

    Google Scholar 

  15. K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, no.9, pp. 1904–1916, 2015. DOI: https://doi.org/10.1109/TPAMI.2015.2389824.

    Article  Google Scholar 

  16. N. N. Ma, X. Y. Zhang, H.-T. Zheng, J. Sun. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 116–131, 2018. DOI: https://doi.org/10.1007/978-3-030-01264-9_8.

    Google Scholar 

  17. J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132–7141, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00745.

    Google Scholar 

  18. F. Yu, D. Q. Wang, E. Shelhamer, T. Darrell. Deep layer aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 2403–2412, 2018. DOI: https://doi.org/10.1109/CV-PR.2018.00255.

    Google Scholar 

  19. Y. D. Ku, J. H. Yang, H. Y. Fang, W. Xiao, J. T. Zhuang. Optimization of grasping efficiency of a robot used for sorting construction and demolition waste. International Journal of Automation and Computing, vol.17, no. 5, pp.691–700, 2020. DOI: https://doi.org/10.1007/sll633-020-1237-0.

    Article  Google Scholar 

  20. X. Yang, H. Sun, X. Sun, M. L. Yan, Z. Guo, K. Fu. Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network. IEEE Access, vol.6, pp.50839–50849, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2869884.

    Article  Google Scholar 

  21. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 2818–2826, 2016. DOI: https://doi.org/10.1109/CVPR.2016.308.

    Google Scholar 

  22. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. [Online], Available: https://arxiv.org/abs/1602.07360, 2016.

  23. A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, H. Adam. MobileN-ets: Efficient convolutional neural networks for mobile vision applications. [Online], Available: https://arxiv.org/abs/1704.04861, 2017.

  24. M. Sandler, A. Howard, M. L. Zhu, A. Zhmoginov, L.-C. Chen. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 4510–4520, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00474.

    Google Scholar 

  25. S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 3–19, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_1.

    Google Scholar 

  26. X. X. Chu, B. Zhang, R. J. Xu. MoGA: Searching beyond MobileNetV3. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Barcelona, Spain, pp. 4042–4046, 2020. DOI: https://doi.org/10.1109/ICASSP40776.2020.9054428.

    Google Scholar 

  27. K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA 2015.

  28. Z. Hu, X. Li, T. Guan. A study on performance and reliability of urethral valve driven by ultrasonic-vaporized steam. International Journal of Automation and Computing, vol. 17, no. 5, pp. 752–762, 2020. DOI: https://doi.org/10.1007/s11633-016-1026-y.

    Article  Google Scholar 

  29. Q. V. Le, J. Ngiam, Z. H. Chen, D. Chia, P. W. Koh, A. Y. Ng. Tiled convolutional neural networks. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, ACM, Red Hook, pp. 1279–1287, 2010.

    Google Scholar 

  30. F. Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 1251–1258, 2017. DOI: https://doi.org/10.1109/CVPR.2017.195.

    Google Scholar 

  31. Z. W. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. M. Liang. UNet++: A nested U-Net architecture for medical image segmentation. In Proceedings of the 4th International and 8th International Workshop Workshop on Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, Granada, Spain, pp. 3–11, 2018. DOI: https://doi.org/10.1007/978-3-030-00889-5_l.

    Google Scholar 

  32. A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, ACM, Lake Tahoe, Nevada, USA, pp. 1097–1105, 2012.

    Google Scholar 

  33. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, vol.88, no. 2, pp.303–338, 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4.

    Article  Google Scholar 

  34. Y. L. Li, S. J. Wang, Q. Tian, X. Q. Ding. Feature representation for statistical-learning-based object detection: A review. Pattern Recognition, vol.48, no. 11, pp.3542–3559, 2015. DOI: https://doi.org/10.1016/j.patcog.2015.04.018.

    Article  Google Scholar 

  35. S. N. Xie, R. Girshick, P. Dollár, Z. W. Tu, K. M. He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 1492–1500, 2017. DOI: https://doi.org/10.1109/CVPR.2017.634.

    Google Scholar 

  36. O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical image Computing and Computer-Assisted Intervention, Munich, Germany, Springer, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28.

    Google Scholar 

  37. Z. M. Li, C. Peng, G. Yu, X. Y. Zhang, Y. D. Deng, J. Sun. DetNet: A backbone network for object detection. [Online], Available: https://arxiv.org/abs/1804.06215, 2018.

  38. K. W. Duan, S. Bai, L. X. Xie, G. H. Qi, Q. M. Huang, Q. Tian. CenterNet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 6569–6578, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00667.

    Google Scholar 

  39. X. Y. Zhang, X. Y. Zhou, M. X. Lin, J. Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 6848–6856, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00716.

    Google Scholar 

  40. M. X. Tan, Q. V. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, California, USA, pp. 6105–6114, 2019.

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions, which are very helpful in improving this paper. And this work was supported by University Synergy Innovation Program of Anhui Province (No. GXXT-2019-007), Corporative Information Processing and Deep Mining for Intelligent Robot (No. JCYJ20170817155854115), Major Project for New Generation of AI (No. 2018AAA0100400), Anhui Provincial Natural Science Foundation (No. 1908085MF206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Wang.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Fu-Tian Wang received the B.Eng. degree in computer science and technology, the M.Eng. degree in computer application and the Ph.D. degree in computer software and theory from Anhui University, China in 2005, 2009 and 2017 respectively. He has been a teacher in Anhui University, China from 2009.

His research interests include image processing, computer vision and edge computing.

Li Yang received the B.Eng. degree in electrical engineering and automation from Luoyang Institute of Technology, China in 2017. He is currently a master student in computer science and technology, Anhui University, China.

His research interests include computer vision, object detection and model compression.

Jin Tang received the B.Eng. degree in automation and the Ph.D. degree in computer science from Anhui University, China in 1999 and 2007, respectively. He is currently a professor with School of Computer Science and Technology, Anhui University, China.

His research interests include computer vision, pattern recognition, machine learning and deep learning.

Si-Bao Chen received the B. Sc. and M.Sc. degrees in probability and statistics and the Ph.D. degree in computer science from Anhui University, China in 2000, 2003 and 2006, respectively. From 2006 to 2008, he was a postdoctoral researcher at Department of Electronic Engineering and Information Science, University of Science and Technology of China. From 2008, he has been a teacher in Anhui University. He was a visiting scholar at University of Texas at Arlington, USA from 2014 to 2015.

His research interests include image processing, pattern recognition, machine learning and computer vision.

Xin Wang received the B. Sc. degree from Department of Precision Machinery and Precision Instruments, University of Science and Technology, China in 1998. Now, she’s the technical director of Shenzhen Raixun Information Technology Co., Ltd., and the Researcher of Peking University Shenzhen Institute, China.

Her research interests include multimedia information processing, speech recognition and Internet Security.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, FT., Yang, L., Tang, J. et al. DLA+: A Light Aggregation Network for Object Classification and Detection. Int. J. Autom. Comput. 18, 963–972 (2021). https://doi.org/10.1007/s11633-021-1287-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-021-1287-y

Keywords

Navigation