Elsevier

Pattern Recognition Letters

Volume 150, October 2021, Pages 258-264
Pattern Recognition Letters

Soft-Boundary Label Relaxation with class placement constraints for semantic segmentation of the railway environment

https://doi.org/10.1016/j.patrec.2021.07.014Get rights and content

Highlights

  • Proposes semantic segmentation for railway environmental understanding.

  • Soft Boundary Label Relaxation solves misaligned label boundaries in railway images.

  • Proposed method is effective for challenging train front-view dataset.

  • Proposed method is also effective for general street-view dataset.

Abstract

In this paper, we focus on the challenging task of the semantic segmentation of train front-view images. Managing trackside facilities can be done by using detailed and precise information about the surrounding railway environment. Semantic segmentation enables us to understand the 2D environment, but there is no adequate large-scale dataset available for training a CNN for this purpose. Some attempts have been made to generate pseudo-data from unlabeled sequential frames to compensate for the lack of volume in training data, but the moving speed of trains makes it difficult to apply them directly. We aim to solve this problem by proposing the Soft Boundary Label Relaxation (Soft-BLR) method, which considers label boundaries extending over multiple pixels to cope with more severely distorted pseudo-data and to better train the CNN in the initial training stage. Furthermore, we modify the loss function to penalize inference results based on the distance from the label boundary to solve the misalignment problems of border pixels. Through experimental evaluation, we report that the proposed method outperforms previous methods on not only the semantic segmentation of challenging railway images, but also that of general street-view images.

Introduction

Railways are valued means of transportation due to their speed, capacity, and reliability, and their extension reaches a total of more than a million kilometers around the Globe. To cope with such characteristics, railway operators especially emphasize on the safety and the prevention of accidents. From simple railway signals to more advanced Automatic Train Stop (ATS) systems, various technologies are used to ensure the safety of passengers. However, the collection of geological / geometrical positions and the types of trackside facilities are currently done manually with high human cost, and some railway operators are even unaware of where and what trackside facilities exist along their tracks due to managing problems within different departments. Daily maintenance of such facilities is also essential, yet it is still being done by manual and/or visual inspection. Therefore, a fully automatic technology that can collect data about trackside facilities and can be used for their maintenance is in crucial need for railway operators. To meet such needs, the use of semantic segmentation for railway environment understanding is currently being considered.

Semantic segmentation, a task of allocating a single semantic label to each and every pixel within an image, can be used to understand the surrounding environment in detail. Almost every modern method of semantic segmentation utilizes Convolutional Neural Network (CNN), and thus requires supervised data [3]. For this reason, building an adequate dataset for semantic segmentation is a substantial issue. A typical pixel-level manual annotation of an image takes more than an hour [4]. Training a CNN model generally requires a massive volume of training data, and constructing a large-scale dataset for every application is unrealistic. Although domain adaptation has been studied to transfer training results to similar domains, such as synthetic to real-world data [10], this approach cannot be applied across dissimilar domains like from street environment to the railway environment.

To cope with such lack of sufficient training data, Zhu et al. [18] originally proposed joint image-label propagation to generate pseudo-data using a small number of labeled images and neighboring sequential unlabeled images for the street-view image domain. They also introduced Boundary Label Relaxation (BLR) to cope with distorted training data generated by joint image-label propagation. In joint image-label propagation, pseudo-data are generated by transforming both an image and its label using densely calculated optical flows between sequential frames. During this transformation, multiple labels can propagate to a single pixel location, thus making the class boundary ambiguous. BLR is introduced to take into account such ambiguity during the training of a CNN. They were able to augment the Cityscapes dataset [4] to eleven times its original size and train a CNN effectively using this method, but the following problems remain:

  • (i)

    Large-scale augmentation is only possible when the dataset is taken at a high frame-rate and with small camera movement. With data taken from cameras mounted on trains, high operation speed of trains makes it difficult to propagate the labels for generating pseudo-data. Distant pixels can be propagated to a single location, increasing the label ambiguity. However, the original method [18] only considers BLR for areas within distance 1 of a label boundary.

  • (ii)

    BLR modifies the label space to allow multiple labels as ground truth in a single pixel location (Fig. 1). This means that inferences with wrong label alignment may be considered as being correct.

To tackle these problems, based on the fact that label boundary distortions of more than a single pixel appear when propagating labels of railway images, we propose a “Soft-BLR” that considers a larger width at the label boundary. We introduce a novel loss function of the CNN to penalize inference results based on the distance from the relaxed label boundary to solve the misalignment problems existent in the original BLR. This loss function also enables a smoother training of the CNN, as close misclassifications will have lower loss values than those of distant misclassifications. These efforts contribute to accurate semantic segmentation of railway environments, even in cases where the original annotations for training are barely available. We demonstrate the effectiveness of the proposed method through experimental analysis, and discuss future applications and possible further improvements.

Section snippets

Semantic segmentation and datasets

On general semantic segmentation, numerous studies have been made in recent years. The use of a CNN became popular with the emerge of the Fully Convolutional Network (FCN) [11], and the trend has been followed by many state-of-the-art models like SegNet [2], PSPNet [17], and DeepLabv3+ [3].

There are some datasets aiming at different domains for the purpose of training a CNN for semantic segmentation. However, existing large-scale datasets mostly consist of either object-wise images [6], or

Overview of the proposed method

We build upon the idea that for a train front-view image sequence augmented with joint image-label propagation, its class boundaries will be more distorted and misaligned than that of street scenes. Using the original Boundary Label Relaxation (BLR) would not be sufficient, as severe distortions cause multiple labels to propagate to a single pixel location and make the class boundary ambiguous. Even more, it does not take into account the order of the classes assigned to pixels around the

Experimental evaluation

In this section, we evaluate the proposed Soft BLR method and compare the results against existing methods. First, we evaluate on a widely-used public dataset for a general understanding of the effects of the soft-boundary label relaxation. Then, we use private train front-view datasets to test the specific effects of the proposed method on the railway environment.

Effectiveness of the proposed method

From the results of Experiment 1 (Section 4.2), the effectiveness of not only the modified loss function, but also widening the width of BLR on a general dataset was observed. We did not augment the training data, so in theory there should not be any distortion in label boundaries. However, the ground-truth pixel-level annotations by human annotators are not always perfect. The true label boundary can be off by several pixels, and in such cases the widened BLR can help the CNN to not focus too

Conclusion

In this paper, we focused on the challenging task of the semantic segmentation of train front-view images and proposed the Soft-Boundary Label Relaxation (Soft-BLR) method as a solution. It extends the width of the class boundary to multiple pixels to cope with more severely distorted pseudo-data. Furthermore, we proposed a novel loss function to penalize inference results based on the distance from the label boundary to solve the misalignment problem.

Through experimental evaluation, we

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Acknowledgment

Parts of this research were supported by MEXT, Grant-in-Aid for Scientific Research JP17H00745.

References (18)

  • V. Badrinarayanan et al.

    Label propagation in video sequences

    Proc. 2010 IEEE Conf. Comput. Vis. Pattern Recognit.

    (2010)
  • V. Badrinarayanan et al.

    SegNet: a deep convolutional encoder-decoder architecture for image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • L.C. Chen et al.

    Encoder-decoder with atrous separable convolution for semantic image segmentation

    Proc. 15th Eur. Conf. Comput. Vis. (Part VII)

    (2018)
  • M. Cordts et al.

    The Cityscapes dataset for semantic urban scene understanding

    Proc. 2016 IEEE Conf. Comput. Vis. Pattern Recognit.

    (2016)
  • J. Deng et al.

    ImageNet: a large-scale hierarchical image database

    Proc. 2009 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.

    (2009)
  • M. Everingham et al.

    The PASCAL visual object classes (VOC) challenge

    Int. J. Comput. Vis.

    (2010)
  • Y. Furitsu et al.

    Semantic segmentation of railway images considering temporal continuity

    Proc. 5th Asian Conf. Pattern Recognit. (Part I)

    (2019)
  • R. Gadde et al.

    Semantic video CNNS through representation warping

    Proc. 16th IEEE Int. Conf. Comput. Vis.

    (2017)
  • E. Ilg et al.

    Flownet 2.0: Evolution of optical flow estimation with deep networks

    Proc. 2017 IEEE Conf. Comput. Vis. Pattern Recognit.

    (2017)
There are more references available in the full text version of this article.

Cited by (3)

View full text