Soft-Boundary Label Relaxation with class placement constraints for semantic segmentation of the railway environment
Introduction
Railways are valued means of transportation due to their speed, capacity, and reliability, and their extension reaches a total of more than a million kilometers around the Globe. To cope with such characteristics, railway operators especially emphasize on the safety and the prevention of accidents. From simple railway signals to more advanced Automatic Train Stop (ATS) systems, various technologies are used to ensure the safety of passengers. However, the collection of geological / geometrical positions and the types of trackside facilities are currently done manually with high human cost, and some railway operators are even unaware of where and what trackside facilities exist along their tracks due to managing problems within different departments. Daily maintenance of such facilities is also essential, yet it is still being done by manual and/or visual inspection. Therefore, a fully automatic technology that can collect data about trackside facilities and can be used for their maintenance is in crucial need for railway operators. To meet such needs, the use of semantic segmentation for railway environment understanding is currently being considered.
Semantic segmentation, a task of allocating a single semantic label to each and every pixel within an image, can be used to understand the surrounding environment in detail. Almost every modern method of semantic segmentation utilizes Convolutional Neural Network (CNN), and thus requires supervised data [3]. For this reason, building an adequate dataset for semantic segmentation is a substantial issue. A typical pixel-level manual annotation of an image takes more than an hour [4]. Training a CNN model generally requires a massive volume of training data, and constructing a large-scale dataset for every application is unrealistic. Although domain adaptation has been studied to transfer training results to similar domains, such as synthetic to real-world data [10], this approach cannot be applied across dissimilar domains like from street environment to the railway environment.
To cope with such lack of sufficient training data, Zhu et al. [18] originally proposed joint image-label propagation to generate pseudo-data using a small number of labeled images and neighboring sequential unlabeled images for the street-view image domain. They also introduced Boundary Label Relaxation (BLR) to cope with distorted training data generated by joint image-label propagation. In joint image-label propagation, pseudo-data are generated by transforming both an image and its label using densely calculated optical flows between sequential frames. During this transformation, multiple labels can propagate to a single pixel location, thus making the class boundary ambiguous. BLR is introduced to take into account such ambiguity during the training of a CNN. They were able to augment the Cityscapes dataset [4] to eleven times its original size and train a CNN effectively using this method, but the following problems remain:
- (i)
Large-scale augmentation is only possible when the dataset is taken at a high frame-rate and with small camera movement. With data taken from cameras mounted on trains, high operation speed of trains makes it difficult to propagate the labels for generating pseudo-data. Distant pixels can be propagated to a single location, increasing the label ambiguity. However, the original method [18] only considers BLR for areas within distance 1 of a label boundary.
- (ii)
BLR modifies the label space to allow multiple labels as ground truth in a single pixel location (Fig. 1). This means that inferences with wrong label alignment may be considered as being correct.
To tackle these problems, based on the fact that label boundary distortions of more than a single pixel appear when propagating labels of railway images, we propose a “Soft-BLR” that considers a larger width at the label boundary. We introduce a novel loss function of the CNN to penalize inference results based on the distance from the relaxed label boundary to solve the misalignment problems existent in the original BLR. This loss function also enables a smoother training of the CNN, as close misclassifications will have lower loss values than those of distant misclassifications. These efforts contribute to accurate semantic segmentation of railway environments, even in cases where the original annotations for training are barely available. We demonstrate the effectiveness of the proposed method through experimental analysis, and discuss future applications and possible further improvements.
Section snippets
Semantic segmentation and datasets
On general semantic segmentation, numerous studies have been made in recent years. The use of a CNN became popular with the emerge of the Fully Convolutional Network (FCN) [11], and the trend has been followed by many state-of-the-art models like SegNet [2], PSPNet [17], and DeepLabv3+ [3].
There are some datasets aiming at different domains for the purpose of training a CNN for semantic segmentation. However, existing large-scale datasets mostly consist of either object-wise images [6], or
Overview of the proposed method
We build upon the idea that for a train front-view image sequence augmented with joint image-label propagation, its class boundaries will be more distorted and misaligned than that of street scenes. Using the original Boundary Label Relaxation (BLR) would not be sufficient, as severe distortions cause multiple labels to propagate to a single pixel location and make the class boundary ambiguous. Even more, it does not take into account the order of the classes assigned to pixels around the
Experimental evaluation
In this section, we evaluate the proposed Soft BLR method and compare the results against existing methods. First, we evaluate on a widely-used public dataset for a general understanding of the effects of the soft-boundary label relaxation. Then, we use private train front-view datasets to test the specific effects of the proposed method on the railway environment.
Effectiveness of the proposed method
From the results of Experiment 1 (Section 4.2), the effectiveness of not only the modified loss function, but also widening the width of BLR on a general dataset was observed. We did not augment the training data, so in theory there should not be any distortion in label boundaries. However, the ground-truth pixel-level annotations by human annotators are not always perfect. The true label boundary can be off by several pixels, and in such cases the widened BLR can help the CNN to not focus too
Conclusion
In this paper, we focused on the challenging task of the semantic segmentation of train front-view images and proposed the Soft-Boundary Label Relaxation (Soft-BLR) method as a solution. It extends the width of the class boundary to multiple pixels to cope with more severely distorted pseudo-data. Furthermore, we proposed a novel loss function to penalize inference results based on the distance from the label boundary to solve the misalignment problem.
Through experimental evaluation, we
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper
Acknowledgment
Parts of this research were supported by MEXT, Grant-in-Aid for Scientific Research JP17H00745.
References (18)
- et al.
Label propagation in video sequences
Proc. 2010 IEEE Conf. Comput. Vis. Pattern Recognit.
(2010) - et al.
SegNet: a deep convolutional encoder-decoder architecture for image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2017) - et al.
Encoder-decoder with atrous separable convolution for semantic image segmentation
Proc. 15th Eur. Conf. Comput. Vis. (Part VII)
(2018) - et al.
The Cityscapes dataset for semantic urban scene understanding
Proc. 2016 IEEE Conf. Comput. Vis. Pattern Recognit.
(2016) - et al.
ImageNet: a large-scale hierarchical image database
Proc. 2009 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
(2009) - et al.
The PASCAL visual object classes (VOC) challenge
Int. J. Comput. Vis.
(2010) - et al.
Semantic segmentation of railway images considering temporal continuity
Proc. 5th Asian Conf. Pattern Recognit. (Part I)
(2019) - et al.
Semantic video CNNS through representation warping
Proc. 16th IEEE Int. Conf. Comput. Vis.
(2017) - et al.
Flownet 2.0: Evolution of optical flow estimation with deep networks
Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit.
(2017)
Cited by (3)
3DGraphSeg: A Unified Graph Representation- Based Point Cloud Segmentation Framework for Full-Range High-Speed Railway Environments
2023, IEEE Transactions on Industrial InformaticsAn Efficient Network for Obstacle Detection in Rail Transit Based on Multi-Task Learning
2023, IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSCMulti-level semantic constraints for dam safety monitoring scenario construction
2023, Proceedings of SPIE - The International Society for Optical Engineering