Boundarymix: Generating pseudo-training images for improving segmentation with scribble annotations☆
Introduction
Semantic segmentation aims to assign semantic labels to each pixel in the given image. It plays an important role in many areas, such as automatic driving [1], disease diagnosing [2], urban planning [3]. Recent methods [4], [5], [6], [7] have achieved outstanding performances using fully convolutional networks (FCNs), which heavily rely on labor-intensive pixel-level annotations (Fig. 1(a)) for training. To alleviate the burden of annotation, weakly-supervised semantic segmentation is intensively investigated recently to train segmentation model using weak annotations with different forms, e.g., image-level annotations [8], [9], [10], [11], points [12], [13], scribbles [14], [15], [16], [17], and bounding boxes [18], [19], [20]. As one of the commonly used weak annotations, scribble on the objects has shown its effectiveness in learning decent segmentation models with less labelling efforts. Annotators draw only scribbles inside the objects to indicate the semantic categories and save the efforts on annotating boundaries (Fig. 1(b)). It is in particular convenient for annotating “stuff” with ambiguous boundaries (e.g., trees with leaves) or not-well-dened shapes (e.g., land use types in remote sensing imagery) [14].
Although scribble annotations are effective, training segmentation model with them tends to produce unsatisfactory results around semantic boundaries. This is because the scribble annotations usually locate inside the objects, and thus the dataset lacks annotations close to the semantic boundaries with changes of categories. Existing scribble-supervised methods [14], [15], [16], [17] try to propagate the scribble annotations to unlabelled pixels and refine the boundaries by using graph-based strategies [21], [22] or additional boundary detection models [23].
In this paper, we propose a different perspective to handle the drawback of scribble annotation. Instead of introducing additional assumptions or processes (e.g., boundary detector), we investigate to supplement the missing boundary annotation by generating pseudo image-annotation pairs that have less boundary annotation ambiguity. A simple-but-effective approach is developed to generate those pairs by selectively mixing the image and the segmentation predictions of two samples, which is referred to as BoundaryMix. Specifically, for an image and its segmentation prediction, we cut off the regions around the estimated boundaries and replace them with the contents or predictions from another image to obtain a new image or its associated pseudo annotation, respectively. The pseudo annotation generated from BoundaryMix tends to be more accurate than the direct segmentation prediction from the original image in the boundary region. This is because the original erroneous boundary region pixels have been removed and a large portion of the replaced pixels are not from the semantic boundaries. Also, the new boundaries produced by BoundaryMix are still visually similar to the original ones because BoundaryMix largely preserves the original object shape and image contents — it only removes a restricted set of pixels along the semantic boundary. Therefore, we conclude that the pseudo image-annotation pairs generated from BoundaryMix are of higher quality. By training on scribbles and the on-the-fly generated pseudo annotations, the network can acquire better prediction capability for boundaries pixels.
We have conducted an intensive experimental study on two datasets, PASCAL VOC 2012 dataset and POTSDAM dataset. PASCAL VOC 2012 dataset is the only dataset provided with scribble annotations. It contains a lot of images with only one large object in the center of the image. To illustrate that scribble annotations can be applied in scene segmentation datasets with more complex images and to show the effectiveness of our proposed method, we have manually annotated POTSDAM dataset with scribble for making comparisons on remote sensing scenarios. Compared with PASCAL VOC 2012, each remote sensing image in POTSDAM contains many objects, and some objects have ambiguous boundaries and no well-dened shape. Scribble can significantly alleviate the burden of collecting pixel-level annotations, but it also poses more significant challenges for semantic segmentation on remote sensing scenarios, as shown in Fig. 2.
To summarise, our main contributions are:
- 1.
We propose a novel method, BoundaryMix, for scribble-supervised semantic segmentation. It supplements the missing boundary information of scribble annotations by generating pseudo training images and annotations.
- 2.
We use scribble to annotate remote sensing images in POTSDAM dataset and show that scribble annotation is also suitable for different scenarios through our experiments. The scribble annotations will be available publicly.
- 3.
We conduct experiments on PASCAL VOC 2012 and POTSDAM datasets with scribble annotations. The experimental results show that our proposed method achieves superior performance and almost closes the gap between weakly-supervised and fully-supervised image segmentation.
Section snippets
Weakly-supervised semantic segmentation
Weakly-supervised semantic segmentation is proposed for alleviating the burden of collecting pixel-level annotations. Common weak annotations includes bounding-box [18], [19], [20], scribble [14], [15], [16], [17], point [12], [13], image-level annotation [8], [9], [10], [11]. In this paper, we explore the issue of weakly-supervised segmentation using scribble annotations. Existing methods can be roughly divided into three categories: using graph-based methods to generate pseudo annotations,
Scribble-supervised semantic segmentation
Scribble annotation is a convenient way to allow a user to specify the object-of-interest. It is drawn in a few strokes inside the object, as shown in Fig. 1(c). Scribble-supervised semantic segmentation uses scribbles as the only supervision for training a segmentation model. For pixels on the scribbles, the ground-truth classes are given.
Formally, we assume that the training data, consists of images and their corresponding annotations . For scribble supervision, not
Creating pseudo samples with the boundarymix operation
Motivated by the above observation and analysis, in this paper, we propose a simple-but-effective solution named BoundaryMix to supplement the missing annotations around the boundary regions. Our idea is to create pseudo training samples that are close to the original images but with less boundary annotation ambiguity. We propose to achieve this by selectively mixing the content and the segmentation predictions of two images to create a pseudo image and its annotation, as shown in Fig. 4.
Dataset and evaluation metric
Our proposed method is trained and evaluated on two datasets: PASCAL VOC 2012 [35] and POTSDAM4. PASCAL VOC 2012 is a commonly used dataset for semantic segmentation, and it is also the only dataset that has been provided with scribble annotations. To illustrate that scribble annotations can be applied in more applications, and to show the effectiveness of our method, we use scribble to annotate remote sensing images in the
Conclusion
In this work, we investigate the limitations of scribble annotations and propose a simple-but-effective approach, BoundaryMix, for scribble-supervised semantic segmentation. Specifically, it generates pseudo image-annotation pairs that have less boundary annotation ambiguity to supplement the missing boundary annotation of scribble. In addition to this, we provide the scribble annotations for a remote sensing dataset, POTSDAM, to illustrate that scribble annotation can be applied in more
Funding
The work was supported by the National Natural Science Foundation of China under Grant 61725105.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The work was supported by the National Natural Science Foundation of China under Grant 61725105. The authors would also like to thank the anonymous reviewers for their very competent comments and helpful suggestions.
Wanxuan Lu received the B.Sc. degree from Beijing Institute of Technology, Beijing, China, in 2016. He is currently pursuing the Ph.D. degree with the Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, and remote sensing image processing.
References (40)
- et al.
Deep gated attention networks for large-scale street-level scene segmentation
Pattern Recognit.
(2019) - et al.
Semantic-aware scene recognition
Pattern Recognit.
(2020) - et al.
Contextual deconvolution network for semantic segmentation
Pattern Recognit.
(2020) - et al.
Semantic segmentation using stride spatial pyramid pooling and dual attention decoder
Pattern Recognit.
(2020) - et al.
Learning to segment with image-level annotations
Pattern Recognit.
(2016) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks
Workshop on challenges in representation learning, ICML
(2013)- et al.
The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
Sci. Data
(2018) - et al.
Dense semantic labeling of subdecimeter resolution images with convolutional neural networks
IEEE Trans. Geosci. Remote Sens.
(2016) - et al.
Fully convolutional networks for semantic segmentation
Proceedings of the IEEE conference on computer vision and pattern recognition
(2015) - et al.
Seed, expand and constrain: Three principles for weakly-supervised image segmentation
European Conference on Computer Vision
(2016)
Integral object mining via online attention accumulation
Proceedings of the IEEE International Conference on Computer Vision
Object region mining with adversarial erasing: A simple classification to semantic segmentation approach
Proceedings of the IEEE conference on computer vision and pattern recognition
Whats the point: Semantic segmentation with point supervision
European conference on computer vision
Weakly supervised scene parsing with point-based distance metric learning
Proceedings of the AAAI Conference on Artificial Intelligence
Scribblesup: Scribble-supervised convolutional networks for semantic segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
On regularized losses for weakly-supervised cnn segmentation
Proceedings of the European Conference on Computer Vision (ECCV)
Boundary perception guidance: a scribble-supervised semantic segmentation approach
Proceedings of the 28th International Joint Conference on Artificial Intelligence
Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation
Proceedings of the IEEE International Conference on Computer Vision
Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cited by (10)
Weakly supervised glottis segmentation on endoscopic images with point supervision
2024, Biomedical Signal Processing and ControlA multi-strategy contrastive learning framework for weakly supervised semantic segmentation
2023, Pattern RecognitionEnd-to-end weakly supervised semantic segmentation with reliable region mining
2022, Pattern RecognitionCollaborative boundary-aware context encoding networks for error map prediction
2022, Pattern RecognitionCitation Excerpt :Lastly, we propose a novel collaborative boundary-aware context encoding framework for accurate error voxel segmentation. Boundary detection is a fundamental task in computer vision, and the boundary information is also helpful and beneficial for other computer vision tasks such as semantic segmentation [23–27], instance segmentation [28], salient object detection [29]. Early boundary detection methods used simple convolutional operators such as Sobel, Canny, etc.
A Novel Unsupervised Segmentation Method of Canopy Images from UAV Based on Hybrid Attention Mechanism
2023, Electronics (Switzerland)Weakly Supervised Segmentation Loss Based on Graph Cuts and Superpixel Algorithm
2022, Neural Processing Letters
Wanxuan Lu received the B.Sc. degree from Beijing Institute of Technology, Beijing, China, in 2016. He is currently pursuing the Ph.D. degree with the Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, and remote sensing image processing.
Kun Fu received the B.Sc., M.Sc., and Ph.D. degrees from the National University of Defense Technology, Changsha, China, in 1995, 1999, and 2002, respectively. He is currently a Professor with the Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China. His research interests include computer vision, remote sensing image understanding, geospatial data mining, and visualization.
Dong Gong received the Ph.D. and B.S. degree in computer science from Northwestern Polytechnical University, Xi’an, China in 2018 and 2012, respectively. He is currently a research fellow at The University of Adelaide, Australia. He was a joint-training Ph.D. student with The University of Adelaide from 2015 to 2016. His research interests include machine learning and optimization techniques and their applications in image processing and computer vision.
Xian Sun received the B.Sc. degree from Beihang University, Beijing, China, in 2004, and the M.Sc. and Ph.D. degrees from the Institute of Electronics, Chinese Academy of Sciences, Beijing, in 2006 and 2009, respectively. He is currently a Professor with the Aerospace Information Research Institute, Chinese Academy of Sciences. His research interests include computer vision and remote sensing image understanding.
Wenhui Diao received the B.Sc. degree from Xidian University, Xi’an, China, in 2011, and the M.Sc. and Ph.D. degrees from the Institute of Electronics, Chinese Academy of Sciences, Beijing, China, in 2016. He is currently an Assistant Professor with the Aerospace Information Research Institute, Chinese Academy of Sciences. His research interests include computer vision and remote sensing image analysis.
Lingqiao Liu received the BS and MS degrees in communication engineering from the University of Electronic Science and Technology of China, Chengdu, in 2006 and 2009, respectively, and the Ph.D. degree from the Australian National University, Canberra, in 2014. He is now a Senior Lecturer at the University of Adelaide. In 2016, he was awarded the Discovery Early Career Researcher Award from the Australian Research Council. His current research interests include low-supervision learning and various topics in computer vision and natural language processing.
- ☆
Work was done when W. Lu was visiting University of Adelaide. Correspondence should be addressed to L. Liu and K. Fu.