Abstract
Panoptic segmentation method enables precise identification and localization of various elements in railway scenes by assigning unique masks to each object in the image, thereby providing crucial data support for autonomous perception tasks in railway environments. However, existing segmentation methods fail to effectively leverage the prominent boundary and linear features of objects such as railway tracks and guardrails, resulting in unsatisfactory segmentation performance in railway scenes. Moreover, the inherent structural limitations of generic segmentation methods lead to weak feature extraction capabilities. Accordingly, this paper proposes the F2RAIL panoptic segmentation network, which achieves a unified approach to multi-scale detection and high-precision recognition through an innovative fusion of Feature Pyramid Networks (FPN) and transformer networks. By introducing an edge feature enhancement module, we address the insufficient utilization of linear features in railway scenes by segmentation models; By introducing a multi-dimensional enhancement module, we resolve the issues of weakened or even lost deep feature information in segmentation models. Based on the aforementioned structural innovations and methodological improvements, F2RAIL achieved a panoptic quality(PQ) of 43.74% on our custom railway dataset, representing a 2.2% improvement over existing state-of-the-art(SOTA) methods. Additionally, it demonstrated comparable performance to SOTA methods on public benchmark datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The COCO dataset used in this study is publicly available and can be accessed. However, the railway dataset is considered a digital asset of our laboratory. Apart from the portions presented in the paper, the rest of the railway data cannot be made publicly available.
References
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV)
Lin C, Zhang Z, Hu Y (2022) Bio-inspired feature enhancement network for edge detection. Appl Intell 52(10):11027–11042
Zhou J, Zhang D, Zhang W (2022) Underwater image enhancement method via multi-feature prior fusion. Appl Intell 52(14):16435–16457
Hu Q, Wei Y, Li X, Wang C, Wang H, Wang S (2023) Svf-net: spatial and visual feature enhancement network for brain structure segmentation. Appl Intell 53(4):4180–4200
Zhang X, Zeng H, Zhang L (2021) Edge-oriented convolution block for real-time super resolution on mobile devices. In: Proceedings of the 29th ACM international conference on multimedia, pp 4034–4043
Ding X, Zhang, X, Han J, Ding G (2021) Diverse branch block: building a convolution as an inception-like unit. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10886–10895
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Qian S, Shi Y, Wu H, Liu J, Zhang W (2022) An adaptive enhancement algorithm based on visual saliency for low illumination images. Appl Intell 52(2):1770–1792
Feng X, Li J, Hua Z, Zhang F (2021) Low-light image enhancement based on multi-illumination estimation. Appl Intell 51(7):5111–5131
Jing J, Liu S, Wang G, Zhang W, Sun C (2022) Recent advances on image edge detection: a comprehensive review. Neurocomputing 503:259–271
Sun W, Dai L, Zhang X, Chang P, He X (2022) Rsod: real-time small object detection algorithm in uav-based traffic monitoring. Appl Intell 1–16
Wang J, Yu J, He Z (2022) Deca: a novel multi-scale efficient channel attention module for object detection in real-life fire images. Appl Intell 1:1–14
Wang J, Yu J, He Z (2022) Arfp: a novel adaptive recursive feature pyramid for object detection in aerial images. Appl Intell 52(11):12844–12859
Yu H, Li X, Feng Y, Han S (2023) Multiple attentional path aggregation network for marine object detection. Appl intell 53(2):2434–2451
Yin G, Yu M, Wang M, Hu Y, Zhang Y (2022) Research on highway vehicle detection based on faster r-cnn and domain adaptation. Appl Intell 52(4):3483–3498
Zhou X, Yu Z, Ruan T, Guo B, Bai D, Sun T (2024) Robust ir-vis image registration with different fovs in railway intrusion detection. Measurement 225:113928
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Weng Y, Huang X, Chen X, He J, Li Z, Yi H (2024) Research on railway track extraction method based on edge detection and attention mechanism. IEEE Access 12:26550–26561
Abed A, Akrout B, Amous I (2024) Convolutional neural network for head segmentation and counting in crowded retail environment using top-view depth images. Arab J Sci Eng 49(3):3735–3749
Rampriya R, Nathan S, Suganya R, Prathiba SB, Perumal PS, Wang W (2024) Lightweight railroad semantic segmentation network and distance estimation for railroad unmanned aerial vehicle images. Eng Appl Artif Intell 134:108620
Yang S, Yunpeng L, Yu L (2022) Lane detection based on instance segmentation of bisenet v2 backbone network. Appl Artif Intell 36(1):2085321
Chen W, Zhang Z, Yu L, Tai Y (2023) Bars: a benchmark for airport runway segmentation. Appl Intell 53(17):20485–20498
Su Z, Li W, Ma Z, Gao R (2022) An improved u-net method for the semantic segmentation of remote sensing images. Appl Intell 52(3):3276–3288
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Wei D, Wei X, Tang Q, Jia L, Yin X, Ji Y (2023) Rtlseg: a novel multi-component inspection network for railway track line based on instance segmentation. Eng Appl Artif Intell 119:105822
Wu Y, Meng F, Qin Y, Qian Y, Xu F, Jia L (2023) Uav imagery based potential safety hazard evaluation for high-speed railroad using real-time instance segmentation. Adv Eng Inf 55:101819
Kirillov A, He K, Girshick R, Rother C, Dollár P (2019) Panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9404–9413
Li Y, Zhao H, Qi X, Wang L, Li Z, Sun J, Jia J (2021) Fully convolutional networks for panoptic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 214–223
Mohan R, Valada A (2021) Efficientps: efficient panoptic segmentation. Int J Comput Vis 129(5):1551–1579
Cheng B, Schwing A, Kirillov A (2021) Per-pixel classification is not all you need for semantic segmentation. Adv Neural Inf Process Syst 34:17864–17875
Cheng B, Misra I, Schwing AG, Kirillov A, Girdhar R (2022) Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1290–1299
Wang H, Zhu Y, Adam H, Yuille A, Chen L-C (2021) Max-deeplab: end-to-end panoptic segmentation with mask transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5463–5474
Zou X, Dou Z-Y, Yang J, Gan Z, Li L, Li C, Dai X, Behl H, Wang J, Yuan L, et al (2023) Generalized decoding for pixel, image, and language. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15116–15127
Li X, Yuan H, Li W, Ding H, Wu S, Zhang W, Li Y, Chen K, Loy CC (2024) Omg-seg: is one model good enough for all segmentation? In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 27948–27959
Funding
This work was supported in part by Fundamental Research Funds for the Central Universities under Grant 2023JBZD002;
and in part by Beijing Natural Science Foundation No. L231002;
and in part by the National Natural Science Foundation of China under Grant 52202486.
Author information
Authors and Affiliations
Contributions
Dingyuan Bai was responsible for constructing, training, and conducting comparative and ablation experiments for the F2RAIL model, as well as writing the paper. Baoqing Guo and Tao Ruan provided overall guidance and funding. Xingfang Zhou, Tao Sun, Yu Wang, and Tao Liu were responsible for collecting relevant literature and reviewing the paper.
Corresponding author
Ethics declarations
Competing interests
No competing interests.
Ethical and informed consent for data used
We ensure compliance with the data ethics requirements set by Springer Publisher and the journal Applied Intelligence.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dingyuan, B., Baoqing, G., Tao, R. et al. F2RAIL: panoptic segmentation integrating Fpn and transFormer towards RAILway. Appl Intell 55, 287 (2025). https://doi.org/10.1007/s10489-024-06158-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06158-7