DLA+: A Light Aggregation Network for Object Classification and Detection

Wang, Fu-Tian; Yang, Li; Tang, Jin; Chen, Si-Bao; Wang, Xin

doi:10.1007/s11633-021-1287-y

DLA+: A Light Aggregation Network for Object Classification and Detection

Research Article
Pattern Recognition
Published: 27 March 2021

Volume 18, pages 963–972, (2021)
Cite this article

International Journal of Automation and Computing Aims and scope Submit manuscript

105 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

An efficient convolution neural network (CNN) plays a crucial role in various visual tasks like object classification or detection, etc. The most common way to construct a CNN is stacking the same convolution block or complex connection. These approaches may be efficient but the parameter size and computation (Comp) have explosive growth. So we present a novel architecture called “DLA+”, which could obtain the feature from the different stages, and by the newly designed convolution block, could achieve better accuracy, while also dropping the computation six times compared to the baseline. We design some experiments about classification and object detection. On the CIFAR10 and VOC data-sets, we get better precision and faster speed than other architecture. The lightweight network even allows us to deploy to some low-performance device like drone, laptop, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Q. Jia, K. M. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. [Online], Available: https://arxiv.org/abs/1706.02677,2017.
X. H. Ding, Y. C. Guo, G. G. Ding, J. G. Han. ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 1911–1920, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00200.
Google Scholar
A. Borja, A. B. Josefson, A. Miles, I. Muxika, F. Olsgard, G. Phillips, J. G. Rodríguez, B. Rygg. An approach to the intercalibration of benthic ecological status assessment in the North Atlantic ecoregion, according to the European Water Framework Directive. Marine Pollution Bulletin, vol. 55, no. 1–6, pp. 42–52, 2007. DOI: https://doi.org/10.1016/j.marpolbul.2006.08.018.
Article Google Scholar
S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ACM, Lille, France, pp. 448–456, 2015.
Google Scholar
H. Law, J. Deng. CornerNet: Detecting objects as paired keypoints. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 734–750, 2018. DOI: https://doi.org/10.1007/978-3-030-01264-9_45.
Google Scholar
F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, K. Keutzer. DenseNet: Implementing efficient ConvNet descriptor pyramids. [Online], Available: https://arxiv.org/abs/1404.1869, 2014.
B. Hu, J. C. Wang. Deep learning based hand gesture recognition and UAV flight controls. International Journal of Automation and Computing, vol. 17, no. 1, pp. 17–29, 2020. DOI: https://doi.org/10.1007/s11633-019-1194-7.
Article Google Scholar
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
Google Scholar
B. Xu, N. Y. Wang, T. Q. Chen, M. Li. Empirical evaluation of rectified activations in convolutional network. [Online], Available: https://arxiv.org/abs/1505.00853, 2015.
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.
Book Google Scholar
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. DOI: https://doi.org/10.1109/5.726791.
Google Scholar
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, F. F. Li. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, vol.115, no. 3, pp. 211–252, 2015. DOI: https://doi.org/10.1007/sll263-015-0816-y.
Google Scholar
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.
Article Google Scholar
T.-Y. Lin, P. Dollar, R. Girshick, K. M. He, B. Hariharan, S. Belongie. Feature pyramid networks for object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2117–2125, 2017. DOI: https://doi.org/10.1109/CVPR.2017.106.
Google Scholar
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, no.9, pp. 1904–1916, 2015. DOI: https://doi.org/10.1109/TPAMI.2015.2389824.
Article Google Scholar
N. N. Ma, X. Y. Zhang, H.-T. Zheng, J. Sun. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 116–131, 2018. DOI: https://doi.org/10.1007/978-3-030-01264-9_8.
Google Scholar
J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132–7141, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00745.
Google Scholar
F. Yu, D. Q. Wang, E. Shelhamer, T. Darrell. Deep layer aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 2403–2412, 2018. DOI: https://doi.org/10.1109/CV-PR.2018.00255.
Google Scholar
Y. D. Ku, J. H. Yang, H. Y. Fang, W. Xiao, J. T. Zhuang. Optimization of grasping efficiency of a robot used for sorting construction and demolition waste. International Journal of Automation and Computing, vol.17, no. 5, pp.691–700, 2020. DOI: https://doi.org/10.1007/sll633-020-1237-0.
Article Google Scholar
X. Yang, H. Sun, X. Sun, M. L. Yan, Z. Guo, K. Fu. Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network. IEEE Access, vol.6, pp.50839–50849, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2869884.
Article Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 2818–2826, 2016. DOI: https://doi.org/10.1109/CVPR.2016.308.
Google Scholar
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. [Online], Available: https://arxiv.org/abs/1602.07360, 2016.
A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, H. Adam. MobileN-ets: Efficient convolutional neural networks for mobile vision applications. [Online], Available: https://arxiv.org/abs/1704.04861, 2017.
M. Sandler, A. Howard, M. L. Zhu, A. Zhmoginov, L.-C. Chen. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 4510–4520, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00474.
Google Scholar
S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 3–19, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_1.
Google Scholar
X. X. Chu, B. Zhang, R. J. Xu. MoGA: Searching beyond MobileNetV3. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Barcelona, Spain, pp. 4042–4046, 2020. DOI: https://doi.org/10.1109/ICASSP40776.2020.9054428.
Google Scholar
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA 2015.
Z. Hu, X. Li, T. Guan. A study on performance and reliability of urethral valve driven by ultrasonic-vaporized steam. International Journal of Automation and Computing, vol. 17, no. 5, pp. 752–762, 2020. DOI: https://doi.org/10.1007/s11633-016-1026-y.
Article Google Scholar
Q. V. Le, J. Ngiam, Z. H. Chen, D. Chia, P. W. Koh, A. Y. Ng. Tiled convolutional neural networks. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, ACM, Red Hook, pp. 1279–1287, 2010.
Google Scholar
F. Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 1251–1258, 2017. DOI: https://doi.org/10.1109/CVPR.2017.195.
Google Scholar
Z. W. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. M. Liang. UNet++: A nested U-Net architecture for medical image segmentation. In Proceedings of the 4th International and 8th International Workshop Workshop on Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, Granada, Spain, pp. 3–11, 2018. DOI: https://doi.org/10.1007/978-3-030-00889-5_l.
Google Scholar
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, ACM, Lake Tahoe, Nevada, USA, pp. 1097–1105, 2012.
Google Scholar
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, vol.88, no. 2, pp.303–338, 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4.
Article Google Scholar
Y. L. Li, S. J. Wang, Q. Tian, X. Q. Ding. Feature representation for statistical-learning-based object detection: A review. Pattern Recognition, vol.48, no. 11, pp.3542–3559, 2015. DOI: https://doi.org/10.1016/j.patcog.2015.04.018.
Article Google Scholar
S. N. Xie, R. Girshick, P. Dollár, Z. W. Tu, K. M. He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 1492–1500, 2017. DOI: https://doi.org/10.1109/CVPR.2017.634.
Google Scholar
O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical image Computing and Computer-Assisted Intervention, Munich, Germany, Springer, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28.
Google Scholar
Z. M. Li, C. Peng, G. Yu, X. Y. Zhang, Y. D. Deng, J. Sun. DetNet: A backbone network for object detection. [Online], Available: https://arxiv.org/abs/1804.06215, 2018.
K. W. Duan, S. Bai, L. X. Xie, G. H. Qi, Q. M. Huang, Q. Tian. CenterNet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 6569–6578, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00667.
Google Scholar
X. Y. Zhang, X. Y. Zhou, M. X. Lin, J. Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 6848–6856, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00716.
Google Scholar
M. X. Tan, Q. V. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, California, USA, pp. 6105–6114, 2019.

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions, which are very helpful in improving this paper. And this work was supported by University Synergy Innovation Program of Anhui Province (No. GXXT-2019-007), Corporative Information Processing and Deep Mining for Intelligent Robot (No. JCYJ20170817155854115), Major Project for New Generation of AI (No. 2018AAA0100400), Anhui Provincial Natural Science Foundation (No. 1908085MF206).

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, 230601, China
Fu-Tian Wang, Li Yang, Jin Tang & Si-Bao Chen
Shenzhen Raixun Information Technology Co., Ltd., Shenzhen, 518000, China
Xin Wang
Peking University Shenzhen Institute, Shenzhen, 518000, China
Xin Wang

Authors

Fu-Tian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Si-Bao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Wang.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Fu-Tian Wang received the B.Eng. degree in computer science and technology, the M.Eng. degree in computer application and the Ph.D. degree in computer software and theory from Anhui University, China in 2005, 2009 and 2017 respectively. He has been a teacher in Anhui University, China from 2009.

His research interests include image processing, computer vision and edge computing.

Li Yang received the B.Eng. degree in electrical engineering and automation from Luoyang Institute of Technology, China in 2017. He is currently a master student in computer science and technology, Anhui University, China.

His research interests include computer vision, object detection and model compression.

Jin Tang received the B.Eng. degree in automation and the Ph.D. degree in computer science from Anhui University, China in 1999 and 2007, respectively. He is currently a professor with School of Computer Science and Technology, Anhui University, China.

His research interests include computer vision, pattern recognition, machine learning and deep learning.

Si-Bao Chen received the B. Sc. and M.Sc. degrees in probability and statistics and the Ph.D. degree in computer science from Anhui University, China in 2000, 2003 and 2006, respectively. From 2006 to 2008, he was a postdoctoral researcher at Department of Electronic Engineering and Information Science, University of Science and Technology of China. From 2008, he has been a teacher in Anhui University. He was a visiting scholar at University of Texas at Arlington, USA from 2014 to 2015.

His research interests include image processing, pattern recognition, machine learning and computer vision.

Xin Wang received the B. Sc. degree from Department of Precision Machinery and Precision Instruments, University of Science and Technology, China in 1998. Now, she’s the technical director of Shenzhen Raixun Information Technology Co., Ltd., and the Researcher of Peking University Shenzhen Institute, China.

Her research interests include multimedia information processing, speech recognition and Internet Security.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, FT., Yang, L., Tang, J. et al. DLA+: A Light Aggregation Network for Object Classification and Detection. Int. J. Autom. Comput. 18, 963–972 (2021). https://doi.org/10.1007/s11633-021-1287-y

Download citation

Received: 24 October 2020
Accepted: 01 February 2021
Published: 27 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11633-021-1287-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DLA+: A Light Aggregation Network for Object Classification and Detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DLA+: A Light Aggregation Network for Object Classification and Detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation