Abstract
An efficient convolution neural network (CNN) plays a crucial role in various visual tasks like object classification or detection, etc. The most common way to construct a CNN is stacking the same convolution block or complex connection. These approaches may be efficient but the parameter size and computation (Comp) have explosive growth. So we present a novel architecture called “DLA+”, which could obtain the feature from the different stages, and by the newly designed convolution block, could achieve better accuracy, while also dropping the computation six times compared to the baseline. We design some experiments about classification and object detection. On the CIFAR10 and VOC data-sets, we get better precision and faster speed than other architecture. The lightweight network even allows us to deploy to some low-performance device like drone, laptop, etc.
Similar content being viewed by others
References
P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Q. Jia, K. M. He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. [Online], Available: https://arxiv.org/abs/1706.02677,2017.
X. H. Ding, Y. C. Guo, G. G. Ding, J. G. Han. ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 1911–1920, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00200.
A. Borja, A. B. Josefson, A. Miles, I. Muxika, F. Olsgard, G. Phillips, J. G. Rodríguez, B. Rygg. An approach to the intercalibration of benthic ecological status assessment in the North Atlantic ecoregion, according to the European Water Framework Directive. Marine Pollution Bulletin, vol. 55, no. 1–6, pp. 42–52, 2007. DOI: https://doi.org/10.1016/j.marpolbul.2006.08.018.
S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ACM, Lille, France, pp. 448–456, 2015.
H. Law, J. Deng. CornerNet: Detecting objects as paired keypoints. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 734–750, 2018. DOI: https://doi.org/10.1007/978-3-030-01264-9_45.
F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, K. Keutzer. DenseNet: Implementing efficient ConvNet descriptor pyramids. [Online], Available: https://arxiv.org/abs/1404.1869, 2014.
B. Hu, J. C. Wang. Deep learning based hand gesture recognition and UAV flight controls. International Journal of Automation and Computing, vol. 17, no. 1, pp. 17–29, 2020. DOI: https://doi.org/10.1007/s11633-019-1194-7.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
B. Xu, N. Y. Wang, T. Q. Chen, M. Li. Empirical evaluation of rectified activations in convolutional network. [Online], Available: https://arxiv.org/abs/1505.00853, 2015.
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298594.
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. DOI: https://doi.org/10.1109/5.726791.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, F. F. Li. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, vol.115, no. 3, pp. 211–252, 2015. DOI: https://doi.org/10.1007/sll263-015-0816-y.
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.
T.-Y. Lin, P. Dollar, R. Girshick, K. M. He, B. Hariharan, S. Belongie. Feature pyramid networks for object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2117–2125, 2017. DOI: https://doi.org/10.1109/CVPR.2017.106.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, no.9, pp. 1904–1916, 2015. DOI: https://doi.org/10.1109/TPAMI.2015.2389824.
N. N. Ma, X. Y. Zhang, H.-T. Zheng, J. Sun. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 116–131, 2018. DOI: https://doi.org/10.1007/978-3-030-01264-9_8.
J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132–7141, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00745.
F. Yu, D. Q. Wang, E. Shelhamer, T. Darrell. Deep layer aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 2403–2412, 2018. DOI: https://doi.org/10.1109/CV-PR.2018.00255.
Y. D. Ku, J. H. Yang, H. Y. Fang, W. Xiao, J. T. Zhuang. Optimization of grasping efficiency of a robot used for sorting construction and demolition waste. International Journal of Automation and Computing, vol.17, no. 5, pp.691–700, 2020. DOI: https://doi.org/10.1007/sll633-020-1237-0.
X. Yang, H. Sun, X. Sun, M. L. Yan, Z. Guo, K. Fu. Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network. IEEE Access, vol.6, pp.50839–50849, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2869884.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 2818–2826, 2016. DOI: https://doi.org/10.1109/CVPR.2016.308.
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. [Online], Available: https://arxiv.org/abs/1602.07360, 2016.
A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, H. Adam. MobileN-ets: Efficient convolutional neural networks for mobile vision applications. [Online], Available: https://arxiv.org/abs/1704.04861, 2017.
M. Sandler, A. Howard, M. L. Zhu, A. Zhmoginov, L.-C. Chen. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 4510–4520, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00474.
S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 3–19, 2018. DOI: https://doi.org/10.1007/978-3-030-01234-2_1.
X. X. Chu, B. Zhang, R. J. Xu. MoGA: Searching beyond MobileNetV3. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Barcelona, Spain, pp. 4042–4046, 2020. DOI: https://doi.org/10.1109/ICASSP40776.2020.9054428.
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA 2015.
Z. Hu, X. Li, T. Guan. A study on performance and reliability of urethral valve driven by ultrasonic-vaporized steam. International Journal of Automation and Computing, vol. 17, no. 5, pp. 752–762, 2020. DOI: https://doi.org/10.1007/s11633-016-1026-y.
Q. V. Le, J. Ngiam, Z. H. Chen, D. Chia, P. W. Koh, A. Y. Ng. Tiled convolutional neural networks. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, ACM, Red Hook, pp. 1279–1287, 2010.
F. Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 1251–1258, 2017. DOI: https://doi.org/10.1109/CVPR.2017.195.
Z. W. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. M. Liang. UNet++: A nested U-Net architecture for medical image segmentation. In Proceedings of the 4th International and 8th International Workshop Workshop on Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, Granada, Spain, pp. 3–11, 2018. DOI: https://doi.org/10.1007/978-3-030-00889-5_l.
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems, ACM, Lake Tahoe, Nevada, USA, pp. 1097–1105, 2012.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, vol.88, no. 2, pp.303–338, 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4.
Y. L. Li, S. J. Wang, Q. Tian, X. Q. Ding. Feature representation for statistical-learning-based object detection: A review. Pattern Recognition, vol.48, no. 11, pp.3542–3559, 2015. DOI: https://doi.org/10.1016/j.patcog.2015.04.018.
S. N. Xie, R. Girshick, P. Dollár, Z. W. Tu, K. M. He. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 1492–1500, 2017. DOI: https://doi.org/10.1109/CVPR.2017.634.
O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical image Computing and Computer-Assisted Intervention, Munich, Germany, Springer, pp. 234–241, 2015. DOI: https://doi.org/10.1007/978-3-319-24574-4_28.
Z. M. Li, C. Peng, G. Yu, X. Y. Zhang, Y. D. Deng, J. Sun. DetNet: A backbone network for object detection. [Online], Available: https://arxiv.org/abs/1804.06215, 2018.
K. W. Duan, S. Bai, L. X. Xie, G. H. Qi, Q. M. Huang, Q. Tian. CenterNet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 6569–6578, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00667.
X. Y. Zhang, X. Y. Zhou, M. X. Lin, J. Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 6848–6856, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00716.
M. X. Tan, Q. V. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, California, USA, pp. 6105–6114, 2019.
Acknowledgements
The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions, which are very helpful in improving this paper. And this work was supported by University Synergy Innovation Program of Anhui Province (No. GXXT-2019-007), Corporative Information Processing and Deep Mining for Intelligent Robot (No. JCYJ20170817155854115), Major Project for New Generation of AI (No. 2018AAA0100400), Anhui Provincial Natural Science Foundation (No. 1908085MF206).
Author information
Authors and Affiliations
Corresponding author
Additional information
Colored figures are available in the online version at https://link.springer.com/journal/11633
Fu-Tian Wang received the B.Eng. degree in computer science and technology, the M.Eng. degree in computer application and the Ph.D. degree in computer software and theory from Anhui University, China in 2005, 2009 and 2017 respectively. He has been a teacher in Anhui University, China from 2009.
His research interests include image processing, computer vision and edge computing.
Li Yang received the B.Eng. degree in electrical engineering and automation from Luoyang Institute of Technology, China in 2017. He is currently a master student in computer science and technology, Anhui University, China.
His research interests include computer vision, object detection and model compression.
Jin Tang received the B.Eng. degree in automation and the Ph.D. degree in computer science from Anhui University, China in 1999 and 2007, respectively. He is currently a professor with School of Computer Science and Technology, Anhui University, China.
His research interests include computer vision, pattern recognition, machine learning and deep learning.
Si-Bao Chen received the B. Sc. and M.Sc. degrees in probability and statistics and the Ph.D. degree in computer science from Anhui University, China in 2000, 2003 and 2006, respectively. From 2006 to 2008, he was a postdoctoral researcher at Department of Electronic Engineering and Information Science, University of Science and Technology of China. From 2008, he has been a teacher in Anhui University. He was a visiting scholar at University of Texas at Arlington, USA from 2014 to 2015.
His research interests include image processing, pattern recognition, machine learning and computer vision.
Xin Wang received the B. Sc. degree from Department of Precision Machinery and Precision Instruments, University of Science and Technology, China in 1998. Now, she’s the technical director of Shenzhen Raixun Information Technology Co., Ltd., and the Researcher of Peking University Shenzhen Institute, China.
Her research interests include multimedia information processing, speech recognition and Internet Security.
Rights and permissions
About this article
Cite this article
Wang, FT., Yang, L., Tang, J. et al. DLA+: A Light Aggregation Network for Object Classification and Detection. Int. J. Autom. Comput. 18, 963–972 (2021). https://doi.org/10.1007/s11633-021-1287-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-021-1287-y