Abstract
Wheat is one of the most significant crops in China, as its yield directly affects the country’s food security. Due to its dense, overlapping, and relatively fuzzy distribution, wheat spikes are prone to being missed in practical detection. Existing object detection models suffer from large model size, high computational complexity, and long computation times. Consequently, this study proposes a lightweight real-time wheat spike detection model called YOLO-LF. Initially, a lightweight backbone network is improved to reduce the model size and lower the number of parameters, thereby improving the runtime speed. Second, the structure of the neck is redesigned in the context of the wheat spike dataset to enhance the feature extraction capability of the network for wheat spikes and to achieve lightweightness. Finally, a lightweight detection head was designed to significantly reduce the FLOPs of the model and achieve further lightweighting. Experimental results on the test set indicate that the size of our model is 1.7 MB, the number of parameters is 0.76 M, and the FLOPs are 2.9, which represent reductions of 73, 74, and 64% compared to YOLOv8n, respectively. Our model demonstrates a latency of 8.6 ms and an FPS of 115 on Titan X, whereas YOLOv8n has a latency of 10.2 ms and an FPS of 97 on the same hardware. In contrast, our model is more lightweight and faster to detect, while the mAP@0.5 only decreases by 0.9%, outperforming YOLOv8 and other mainstream detection networks in overall performance. Consequently, our model can be deployed on mobile devices to provide effective assistance in the real-time detection of wheat spikes.















Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
Wen, C., Wu, J., Chen, H., Su, H., Chen, X., Li, Z., Yang, C.: Wheat spike detection and counting in the field based on SpikeRetinaNet. Front. Plant Sci. 13, 821717 (2022)
Alkhudaydi, T., Lglesia, B.: Counting spikelets from infield wheat crop images using fully convolutional networks. Neural Comput. Appl. 34(20), 17539–17560 (2022)
Ullah, E., Ullah, M., Sajjad, M., Cheikh, F.A.: Deep learning based wheat ears count in robot images for wheat phenotyping. Electron. Imaging 34, 1–6 (2022)
Dandrifosse, S., Ennadifi, E., Carlier, A., Gosselin, B., Dumont, B., Mercatoris, B.: Deep learning for wheat ear segmentation and ear density measurement: from heading to maturity. Comput. Electron. Agric. 199, 107161 (2022)
Zaji, A., Liu, Z., Xiao, G., Bhowmik, P., Sangha, J.S., Ruan, Y.: AutoOLA: automatic object level augmentation for wheat spikes counting. Comput. Electron. Agric. 205, 107623 (2023)
Fernandez-Gallego, J.A., Lootens, P., Borra-Serrano, I., Derycke, V., Haesaert, G., Roldán-Ruiz, I., Araus, J.L., Kefauver, S.C.: Automatic wheat ear counting using machine learning based on RGB UAV imagery. Plant J. 103(4), 1603–1613 (2020)
Zaji, A., Liu, Z., Xiao, G., Bhowmik, P., Sangha, J.S., Ruan, Y.: Wheat spike localization and counting via hybrid UNet architectures. Comput. Electron. Agric. 203, 107439 (2022)
Dong, L., Guangqiao, C., Yibai, L., Cong, C.: Recognition and counting of wheat ears at flowering stage of heading poplar based on color features. J. Chin. Agric. Mech. 42(11), 97 (2021)
Xu, X., Geng, Q., Gao, F., Xiong, D., Qiao, H., Ma, X.: Segmentation and counting of wheat spike grains based on deep learning and textural feature. Plant Methods 19(1), 77 (2023)
Li, H., Di, L., Zhang, C., Lin, L., Guo, L.: Improvement of in-season crop mapping for Illinois cropland using multiple machine learning classifiers. In: 2022 10th International Conference on Agro-geoinformatics (Agro-Geoinformatics), pp. 1–6. IEEE (2022)
Fourati, F., Mseddi, W.S., Attia, R.: Wheat head detection using deep, semi-supervised and ensemble learning. Can. J. Remote. Sens. 47(2), 198–208 (2021)
Bhagat, S., Kokare, M., Haswani, V., Hambarde, P., Kamble, R.: WheatNet-lite: a novel light weight network for wheat head detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1332–1341 (2021)
Lin, L., Di, L., Zhang, C., Guo, L., Di, Y., Li, H., Yang, A.: Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm. Sci. Data 9(1), 63 (2022)
Weiss, M., Jacob, F., Duveiller, G.: Remote sensing for agricultural applications: a meta-review. Remote Sens. Environ. 236, 111402 (2020)
Guo, H.: Wheat head counting by estimating a density map with convolutional neural networks. arXiv preprint (2023). arXiv:2303.10542
Zhao, J., Yan, J., Xue, T., Wang, S., Qiu, X., Yao, X., Tian, Y., Zhu, Y., Cao, W., Zhang, X.: A deep learning method for oriented and small wheat spike detection (OSWSDET) in UAV images. Comput. Electron. Agric. 198, 107087 (2022)
Laabassi, K., Belarbi, M.A., Mahmoudi, S., Mahmoudi, S.A., Ferhat, K.: Wheat varieties identification based on a deep learning approach. J. Saudi Soc. Agric. Sci. 20(5), 281–289 (2021)
Misra, T., Arora, A., Marwaha, S., Jha, R.R., Ray, M., Jain, R., Rao, A., Varghese, E., Kumar, S., Kumar, S., et al.: Web-SpikeSegNet: deep learning framework for recognition and counting of spikes from visual images of wheat plants. IEEE Access 9, 76235–76247 (2021)
Qiu, R., He, Y., Zhang, M.: Automatic detection and counting of wheat spikelet using semi-automatic labeling and deep learning. Front. Plant Sci. 13, 872555 (2022)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
He, M.-X., Hao, P., Xin, Y.-Z.: A robust method for wheatear detection using UAV in natural scenes. IEEE Access 8, 189043–189053 (2020)
Xiang, S., Wang, S., Xu, M., Wang, W., Liu, W.: YOLO POD: a fast and accurate multi-task model for dense soybean POD counting. Plant Methods 19(1), 8 (2023)
Khaki, S., Safaei, N., Pham, H., Wang, L.: WheatNet: a lightweight convolutional neural network for high-throughput image-based wheat head detection and counting. Neurocomputing 489, 78–89 (2022)
Ye, J., Yu, Z., Wang, Y., Lu, D., Zhou, H.: WheatLFANet: in-field detection and counting of wheat heads with high-real-time global regression network. Plant Methods 19(1), 103 (2023)
Sangeetha, J., Govindarajan, P.: Prediction of agricultural waste compost maturity using fast regions with convolutional neural network (R-CNN). Mater. Today Proc. (2023)
Zhang, J., Min, A., Steffenson, B.J., Su, W.-H., Hirsch, C.D., Anderson, J., Wei, J., Ma, Q., Yang, C.: Wheat-Net: an automatic dense wheat spike segmentation method based on an optimized hybrid task cascade model. Front. Plant Sci. 13, 834938 (2022)
Li, L., Hassan, M.A., Yang, S., Jing, F., Yang, M., Rasheed, A., Wang, J., Xia, X., He, Z., Xiao, Y.: Development of image-based wheat spike counter through a faster R-CNN algorithm and application for genetic studies. Crop J. 10(5), 1303–1311 (2022)
Im Choi, J., Tian, Q.: Visual-saliency-guided channel pruning for deep visual detectors in autonomous driving. In: 2023 IEEE Intelligent Vehicles Symposium (IV), pp. 1–6. IEEE (2023)
Wang, D., Zhang, D., Yang, G., Xu, B., Luo, Y., Yang, X.: SSRNet: in-field counting wheat ears using multi-stage convolutional neural network. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2021)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28. (2015)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint (2017). arXiv:1704.04861
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Mehta, S., Rastegari, M.: MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint (2021). arXiv:2110.02178
Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., Yan, S.: MetaFormer is actually what you need for vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10819–10829 (2022)
Maaz, M., Shaker, A., Cholakkal, H., Khan, S., Zamir, S.W., Anwer, R.M., Shahbaz Khan, F.: EdgeNext: efficiently amalgamated CNN-transformer architecture for mobile vision applications. In: European Conference on Computer Vision, pp. 3–20. Springer (2022)
Chen, J., Kao, S.-h., He, H., Zhuo, W., Wen, S., Lee, C.-H., Chan, S.-H.G.: Run, don’t walk: chasing higher flops for faster neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12021–12031 (2023)
Wang, A., Chen, H., Lin, Z., Pu, H., Ding, G.: RepViT: revisiting mobile CNN from vit perspective. arXiv preprint (2023). arXiv:2307.09283
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOV7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25. (2012)
Acknowledgements
This work was supported in part by the Humanities and Social Sciences Planning Fund Projects of Ministry of Education of China under Grant 23YJAZH226, "Research on the Development Path of Artificial Intelligence Based on ChatGPT-like Generated Content", 2023-09 2026-08, and the Hunan Provincial Natural Science Foundation of China under Grant (2024JJ5042, 2023JJ30050).
Author information
Authors and Affiliations
Contributions
Sr.Z provided financial support, conceived the experimental ideas, proposed the main methods of this paper, and reviewed and revised the first draft. Sz.L prepared the dataset, conducted the experiments, wrote the initial draft of the paper, and created the figures and experimental visualizations. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests. Financial and personal relationships: We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, S., Long, S. YOLO-LF: a lightweight multi-scale feature fusion algorithm for wheat spike detection. J Real-Time Image Proc 21, 148 (2024). https://doi.org/10.1007/s11554-024-01529-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01529-2