Abstract
In this study, a dual-path decoder segmentation network (DPDS) is presented, which innovatively introduces a dual-path structure into a semantic segmentation network incorporating atrous spatial pyramid pooling (ASPP). A novel loss function, boundary focal loss (BFLoss), is designed specifically for wheat ears segmentation scenarios, which adaptively adjusts weights for different pixel points through the binarization of boundary information, focusing the training on the edges of wheat ears. It is suggested to apply the DPDS network in conjunction with BFLoss to the semantic segmentation of wheat ears. The experimental results demonstrated that BFLoss possesses advantages over commonly used binary cross entropy loss (BCELoss) and focal loss in semantic segmentation. Additionally, the dual-path decoder architecture was proved to reach higher precision than activating only one of the pathways. In comparative experiments with established semantic segmentation networks, the DPDS model achieved the best performance on several evaluation metrics, and attained a balance between precision and recall. Notably, the combination of DPDS and BFLoss achieved a 91.86% F1 score on the wheat ears semantic segmentation test dataset. Therefore, the DPDS model can be effectively applied to semantic segmentation scenarios of crops like wheat, and also provides new insights for the improvement of existing networks. Code is available at https://github.com/awesome-pythoner/dual-path-decoder-segment.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability statement
Data will be made available on reasonable request.
References
Gutierrez M, Reynolds MP, Klatt AR (2015) Effect of leaf and spike morphological traits on the relationship between spectral reflectance indices and yield in wheat. Int J Remote Sens 36(3):701–718
Anderegg J, Yu K, Aasen H, Walter A, Liebisch F, Hund A (2020) Spectral vegetation indices to track senescence dynamics in diverse wheat germplasm. Front Plant Sci 10:1749
Assadzadeh S, Walker CK, McDonald LS, Panozzo JF (2022) Prediction of milling yield in wheat with the use of spectral, colour, shape, and morphological features. Biosys Eng 214:28–41
Barmeier G, Schmidhalter U (2017) High-throughput field phenotyping of leaves, leaf sheaths, culms and ears of spring barley cultivars at anthesis and dough ripeness. Front Plant Sci 8:1920
Dandrifosse S, Ennadifi E, Carlier A, Gosselin B, Dumont B, Mercatoris B (2022) Deep learning for wheat ear segmentation and ear density measurement: from heading to maturity. Comput Electron Agric 199:107161
Wang D, Zhang D, Yang G, Xu B, Luo Y, Yang X (2021) Ssrnet: in-field counting wheat ears using multi-stage convolutional neural network. IEEE Trans Geosci Remote Sens 60:1–11
Ma J, Li Y, Du K, Zheng F, Zhang L, Gong Z, Jiao W (2020) Segmenting ears of winter wheat at flowering stage using digital images and deep learning. Comput Electron Agric 168:105159
Ennadifi E, Dandrifosse S, Mokhtari MEA, Carlier A, Laraba S, Mercatoris B, Gosselin B (2022) Local unsupervised wheat head segmentation. In: 2022 IEEE 18th international conference on Intelligent Computer Communication and Processing (ICCP). IEEE, pp 55–62
Singh N, Tewari V, Biswas P, Dhruw L, Pareek C, Singh HD (2022) Semantic segmentation of in-field cotton bolls from the sky using deep convolutional neural networks. Smart Agricultural Technology 2:100045
Ma J, Li Y, Liu H, Du K, Zheng F, Wu Y, Zhang L (2020) Improving segmentation accuracy for ears of winter wheat at flowering stage by semantic segmentation. Comput Electron Agric 176:105662
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. In: Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, vol 2. IEEE, pp 1458–1465
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE, pp 2169–2178
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst 29
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, pp 234–241
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, pp 3–11
Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen Y-W, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1055–1059
Abdollahi A, Pradhan B, Alamri A (2020) Vnet: an end-to-end fully convolutional neural network for road extraction from high-resolution remote sensing data. IEEE Access. 8:179424–179436
Xiao X, Lian S, Luo Z, Li S (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th international conference on Information Technology in Medicine and Education (ITME). IEEE, pp 327–331
Guan S, Khan AA, Sikdar S, Chitnis PV (2019) Fully dense unet for 2-d sparse photoacoustic tomography artifact removal. IEEE J Biomed Health Inform 24(2):568–576
Ibtehaz N, Rahman MS (2020) Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818
Poudel RP, Bonde U, Liwicki S, Zach C (2018) Contextnet: exploring context and detail for semantic segmentation in real-time. arXiv preprint arXiv:1805.04554
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 405–420
Mazzini D (2018) Guided upsampling network for real-time semantic segmentation. arXiv preprint arXiv:1807.07466
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 325–341
Yu C, Gao C, Wang J, Yu G, Shen C, Sang N (2021) Bisenet v2: bilateral network with guided aggregation for real-time semantic segmentation. Int J Comput Vision 129:3051–3068
Poudel RP, Liwicki S, Cipolla R (2019) Fast-scnn: fast semantic segmentation network. arXiv preprint arXiv:1902.04502
Takikawa T, Acuna D, Jampani V, Fidler S (2019) Gated-scnn: gated shape cnns for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5229–5238
Fan M, Lai S, Huang J, Wei X, Chai Z, Luo J, Wei X (2021) Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9716–9725
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Cui Y, Jia M, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9268–9277
Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P, Dokania P (2020) Calibrating deep neural networks using focal loss. Adv Neural Inf Process Syst 33:15288–15299
Li X, Wang W, Wu L, Chen S, Hu X, Li J, Tang J, Yang J (2020) Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv Neural Inf Process Syst 33:21002–21012
Abraham N, Khan NM (2019) A novel focal tversky loss function with improved attention u-net for lesion segmentation. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE, pp 683–687
Sudre CH, Li W, Vercauteren T, Ourselin S, Jorge Cardoso M (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3. Springer, pp 240–248
Salehi SSM, Erdogmus D, Gholipour A (2017) Tversky loss function for image segmentation using 3d fully convolutional deep networks. In: International workshop on machine learning in medical imaging. Springer, pp 379–387
Rahman MA, Wang Y (2016) Optimizing intersection-over-union in deep neural networks for image segmentation. In: International symposium on visual computing. Springer, pp 234–244
Taghanaki SA, Zheng Y, Zhou SK, Georgescu B, Sharma P, Xu D, Comaniciu D, Hamarneh G (2019) Combo loss: handling input and output imbalance in multi-organ segmentation. Comput Med Imaging Graph 75:24–33
Ma J, Chen J, Ng M, Huang R, Li Y, Li C, Yang X, Martel AL (2021) Loss odyssey in medical image segmentation. Med Image Anal 71:102035
Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, Roth HR, Xu D (2022) Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 574–584
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zhang Y, Yang Q (2021) A survey on multi-task learning. IEEE Trans Knowl Data Eng 34(12):5586–5609
Caruana R (1997) Multitask learning. Mach Learn 28:41–75
Acknowledgements
This work was supported by the Primary Research & Development Plan of Jiangsu Province (BE2022389).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. System construction, data collection, and analysis were performed by Yu Chen and Lihui Wang. The first draft of the manuscript was written by Yu Chen. The revision was completed with the help of Lihui Wang, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, L., Chen, Y. Dual-path decoder architecture for semantic segmentation of wheat ears. Appl Intell 55, 128 (2025). https://doi.org/10.1007/s10489-024-06023-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06023-7