Elsevier

Pattern Recognition

Volume 117, September 2021, 107924
Pattern Recognition

Boundarymix: Generating pseudo-training images for improving segmentation with scribble annotations

https://doi.org/10.1016/j.patcog.2021.107924Get rights and content

Highlights

  • BoundaryMix is proposed for scribble-supervised semantic segmentation.

  • BoundaryMix supplements the missing boundary information of scribble annotations by generating pseudo training images and annotations.

  • Scribble are used to annotate remote sensing images and show that scribble annotation is also suitable for different scenarios.

  • Experiments on PASCAL VOC and POTSDAM datasets show that BoundaryMix almost closes the gap between weakly-supervised and fully-supervised semantic segmentation.

Abstract

Weakly-supervised semantic segmentation, as a promising solution to alleviate the burden of collecting per-pixel annotations, aims to train a segmentation model from partial weak annotations. Scribble on the object is one of the commonly used weak annotations and has shown to be sufficient for learning a decent segmentation model. Despite being effective, scribble-based weakly-supervised learning methods often lead to imprecise segmentation on object boundaries. This is mainly because the scribble annotations usually locate inside the objects and the dataset lacks annotations close to the semantic boundaries. To alleviate this issue, this paper proposes a simple-but-effective solution, i.e., BoundaryMix, which generates pseudo training image-annotation pairs from the original images to supplement the missing semantic boundaries. Specifically, given a prediction of segmentation, we cut off the regions around the estimated boundaries, which are error-prone and replace them with the contents from another image, which in effect creates new samples with less ambiguity around semantic boundaries. With training on scribbles and the on-the-fly generated pseudo annotations, the network acquires better prediction capability around the boundary region and thus improves the overall segmentation performance. By conducting experiments on PASCAL VOC 2012 dataset and POTSDAM dataset with only scribble annotations, we demonstrate the excellent performance of the proposed method and the almost closed gap between scribble-supervised and fully-supervised image segmentation.

Introduction

Semantic segmentation aims to assign semantic labels to each pixel in the given image. It plays an important role in many areas, such as automatic driving [1], disease diagnosing [2], urban planning [3]. Recent methods [4], [5], [6], [7] have achieved outstanding performances using fully convolutional networks (FCNs), which heavily rely on labor-intensive pixel-level annotations (Fig. 1(a)) for training. To alleviate the burden of annotation, weakly-supervised semantic segmentation is intensively investigated recently to train segmentation model using weak annotations with different forms, e.g., image-level annotations [8], [9], [10], [11], points [12], [13], scribbles [14], [15], [16], [17], and bounding boxes [18], [19], [20]. As one of the commonly used weak annotations, scribble on the objects has shown its effectiveness in learning decent segmentation models with less labelling efforts. Annotators draw only scribbles inside the objects to indicate the semantic categories and save the efforts on annotating boundaries (Fig. 1(b)). It is in particular convenient for annotating “stuff” with ambiguous boundaries (e.g., trees with leaves) or not-well-dened shapes (e.g., land use types in remote sensing imagery) [14].

Although scribble annotations are effective, training segmentation model with them tends to produce unsatisfactory results around semantic boundaries. This is because the scribble annotations usually locate inside the objects, and thus the dataset lacks annotations close to the semantic boundaries with changes of categories. Existing scribble-supervised methods [14], [15], [16], [17] try to propagate the scribble annotations to unlabelled pixels and refine the boundaries by using graph-based strategies [21], [22] or additional boundary detection models [23].

In this paper, we propose a different perspective to handle the drawback of scribble annotation. Instead of introducing additional assumptions or processes (e.g., boundary detector), we investigate to supplement the missing boundary annotation by generating pseudo image-annotation pairs that have less boundary annotation ambiguity. A simple-but-effective approach is developed to generate those pairs by selectively mixing the image and the segmentation predictions of two samples, which is referred to as BoundaryMix. Specifically, for an image and its segmentation prediction, we cut off the regions around the estimated boundaries and replace them with the contents or predictions from another image to obtain a new image or its associated pseudo annotation, respectively. The pseudo annotation generated from BoundaryMix tends to be more accurate than the direct segmentation prediction from the original image in the boundary region. This is because the original erroneous boundary region pixels have been removed and a large portion of the replaced pixels are not from the semantic boundaries. Also, the new boundaries produced by BoundaryMix are still visually similar to the original ones because BoundaryMix largely preserves the original object shape and image contents — it only removes a restricted set of pixels along the semantic boundary. Therefore, we conclude that the pseudo image-annotation pairs generated from BoundaryMix are of higher quality. By training on scribbles and the on-the-fly generated pseudo annotations, the network can acquire better prediction capability for boundaries pixels.

We have conducted an intensive experimental study on two datasets, PASCAL VOC 2012 dataset and POTSDAM dataset. PASCAL VOC 2012 dataset is the only dataset provided with scribble annotations. It contains a lot of images with only one large object in the center of the image. To illustrate that scribble annotations can be applied in scene segmentation datasets with more complex images and to show the effectiveness of our proposed method, we have manually annotated POTSDAM dataset with scribble for making comparisons on remote sensing scenarios. Compared with PASCAL VOC 2012, each remote sensing image in POTSDAM contains many objects, and some objects have ambiguous boundaries and no well-dened shape. Scribble can significantly alleviate the burden of collecting pixel-level annotations, but it also poses more significant challenges for semantic segmentation on remote sensing scenarios, as shown in Fig. 2.

To summarise, our main contributions are:

  • 1.

    We propose a novel method, BoundaryMix, for scribble-supervised semantic segmentation. It supplements the missing boundary information of scribble annotations by generating pseudo training images and annotations.

  • 2.

    We use scribble to annotate remote sensing images in POTSDAM dataset and show that scribble annotation is also suitable for different scenarios through our experiments. The scribble annotations will be available publicly.

  • 3.

    We conduct experiments on PASCAL VOC 2012 and POTSDAM datasets with scribble annotations. The experimental results show that our proposed method achieves superior performance and almost closes the gap between weakly-supervised and fully-supervised image segmentation.

Section snippets

Weakly-supervised semantic segmentation

Weakly-supervised semantic segmentation is proposed for alleviating the burden of collecting pixel-level annotations. Common weak annotations includes bounding-box [18], [19], [20], scribble [14], [15], [16], [17], point [12], [13], image-level annotation [8], [9], [10], [11]. In this paper, we explore the issue of weakly-supervised segmentation using scribble annotations. Existing methods can be roughly divided into three categories: using graph-based methods to generate pseudo annotations,

Scribble-supervised semantic segmentation

Scribble annotation is a convenient way to allow a user to specify the object-of-interest. It is drawn in a few strokes inside the object, as shown in Fig. 1(c). Scribble-supervised semantic segmentation uses scribbles as the only supervision for training a segmentation model. For pixels on the scribbles, the ground-truth classes are given.

Formally, we assume that the training data, T={(Xi,Yi)}i=1N, consists of N images Xi and their corresponding annotations Yi. For scribble supervision, not

Creating pseudo samples with the boundarymix operation

Motivated by the above observation and analysis, in this paper, we propose a simple-but-effective solution named BoundaryMix to supplement the missing annotations around the boundary regions. Our idea is to create pseudo training samples that are close to the original images but with less boundary annotation ambiguity. We propose to achieve this by selectively mixing the content and the segmentation predictions of two images to create a pseudo image and its annotation, as shown in Fig. 4.

Dataset and evaluation metric

Our proposed method is trained and evaluated on two datasets: PASCAL VOC 2012 [35] and POTSDAM4. PASCAL VOC 2012 is a commonly used dataset for semantic segmentation, and it is also the only dataset that has been provided with scribble annotations. To illustrate that scribble annotations can be applied in more applications, and to show the effectiveness of our method, we use scribble to annotate remote sensing images in the

Conclusion

In this work, we investigate the limitations of scribble annotations and propose a simple-but-effective approach, BoundaryMix, for scribble-supervised semantic segmentation. Specifically, it generates pseudo image-annotation pairs that have less boundary annotation ambiguity to supplement the missing boundary annotation of scribble. In addition to this, we provide the scribble annotations for a remote sensing dataset, POTSDAM, to illustrate that scribble annotation can be applied in more

Funding

The work was supported by the National Natural Science Foundation of China under Grant 61725105.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The work was supported by the National Natural Science Foundation of China under Grant 61725105. The authors would also like to thank the anonymous reviewers for their very competent comments and helpful suggestions.

Wanxuan Lu received the B.Sc. degree from Beijing Institute of Technology, Beijing, China, in 2016. He is currently pursuing the Ph.D. degree with the Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, and remote sensing image processing.

References (40)

  • P.-T. Jiang et al.

    Integral object mining via online attention accumulation

    Proceedings of the IEEE International Conference on Computer Vision

    (2019)
  • Y. Wei et al.

    Object region mining with adversarial erasing: A simple classification to semantic segmentation approach

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2017)
  • A. Bearman et al.

    Whats the point: Semantic segmentation with point supervision

    European conference on computer vision

    (2016)
  • R. Qian et al.

    Weakly supervised scene parsing with point-based distance metric learning

    Proceedings of the AAAI Conference on Artificial Intelligence

    (2019)
  • D. Lin et al.

    Scribblesup: Scribble-supervised convolutional networks for semantic segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • M. Tang et al.

    On regularized losses for weakly-supervised cnn segmentation

    Proceedings of the European Conference on Computer Vision (ECCV)

    (2018)
  • B. Wang et al.

    Boundary perception guidance: a scribble-supervised semantic segmentation approach

    Proceedings of the 28th International Joint Conference on Artificial Intelligence

    (2019)
  • A. Obukhov, S. Georgoulis, D. Dai, L. Van Gool, Gated CRF loss for weakly supervised semantic image segmentation, arXiv...
  • J. Dai et al.

    Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • C. Song et al.

    Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • Cited by (10)

    • Collaborative boundary-aware context encoding networks for error map prediction

      2022, Pattern Recognition
      Citation Excerpt :

      Lastly, we propose a novel collaborative boundary-aware context encoding framework for accurate error voxel segmentation. Boundary detection is a fundamental task in computer vision, and the boundary information is also helpful and beneficial for other computer vision tasks such as semantic segmentation [23–27], instance segmentation [28], salient object detection [29]. Early boundary detection methods used simple convolutional operators such as Sobel, Canny, etc.

    View all citing articles on Scopus

    Wanxuan Lu received the B.Sc. degree from Beijing Institute of Technology, Beijing, China, in 2016. He is currently pursuing the Ph.D. degree with the Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, and remote sensing image processing.

    Kun Fu received the B.Sc., M.Sc., and Ph.D. degrees from the National University of Defense Technology, Changsha, China, in 1995, 1999, and 2002, respectively. He is currently a Professor with the Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, remote sensing image understanding, geospatial data mining, and visualization.

    Dong Gong received the Ph.D. and B.S. degree in computer science from Northwestern Polytechnical University, Xi’an, China in 2018 and 2012, respectively. He is currently a research fellow at The University of Adelaide, Australia. He was a joint-training Ph.D. student with The University of Adelaide from 2015 to 2016. His research interests include machine learning and optimization techniques and their applications in image processing and computer vision.

    Xian Sun received the B.Sc. degree from Beihang University, Beijing, China, in 2004, and the M.Sc. and Ph.D. degrees from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2006 and 2009, respectively. He is currently a Professor with the Aerospace Information Research Institute, Chinese Academy of Sciences. His research interests include computer vision and remote sensing image understanding.

    Wenhui Diao received the B.Sc. degree from Xidian University, Xi’an, China, in 2011, and the M.Sc. and Ph.D. degrees from the Institute of Electronics, Chinese Academy of Sciences, Beijing, China, in 2016. He is currently an Assistant Professor with the Aerospace Information Research Institute, Chinese Academy of Sciences. His research interests include computer vision and remote sensing image analysis.

    Lingqiao Liu received the BS and MS degrees in communication engineering from the University of Electronic Science and Technology of China, Chengdu, in 2006 and 2009, respectively, and the Ph.D. degree from the Australian National University, Canberra, in 2014. He is now a Senior Lecturer at the University of Adelaide. In 2016, he was awarded the Discovery Early Career Researcher Award from the Australian Research Council. His current research interests include low-supervision learning and various topics in computer vision and natural language processing.

    Work was done when W. Lu was visiting University of Adelaide. Correspondence should be addressed to L. Liu and K. Fu.

    View full text