End-to-end weakly supervised semantic segmentation with reliable region mining

doi:10.1016/j.patcog.2022.108663

Pattern Recognition

Volume 128, August 2022, 108663

https://doi.org/10.1016/j.patcog.2022.108663 Get rights and content

Highlights

•
We make an exten sion of our previous wok and design a more powerful end to end n etwork for weakly supervised semantic segmentation
•
We propose two new loss functions for utilizing the reliable labels, including a new dense energy loss and a batch based class distance loss. The former relies on shallow features, whilst the latter focuses on distinguishing high level s emantic features for different classes.
•
We design a new attention module to extract comprehensive global information. By using a re weighting technique, it can suppress dominant or noisy attention values and aggregate sufficient global information.
•
Our approach achieves a new state of the art performance for weakly supervised semantic segmentation.

Abstract

Weakly supervised semantic segmentation is a challenging task that only takes image-level labels as supervision but produces pixel-level predictions for testing. To address such a challenging task, most current approaches generate pseudo pixel masks first that are then fed into a separate semantic segmentation network. However, these two-step approaches suffer from high complexity and being hard to train as a whole. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into tiny reliable object/background regions. Such reliable regions are then directly served as ground-truth labels for the segmentation branch, where both global information and local information sub-branches are used to generate accurate pixel-level predictions. Furthermore, a new joint loss is proposed that considers both shallow and high-level features. Despite its apparent simplicity, our end-to-end solution achieves competitive mIoU scores (val: 65.4%, test: 65.3%) on Pascal VOC compared with the two-step counterparts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC 2012 dataset(val: 69.3%, test: 69.2%). Code is available at: https://github.com/zbf1991/RRM.

Introduction

Recently, weakly supervised semantic segmentation receives great interest and is being extensively studied. Requiring merely low degree (cheaper or simpler) annotations including scribbles [1], [2], [3], bounding boxes [4], [5], points [6], [7] and image-level labels [8], [9], [10] for training, weakly supervised semantic segmentation offers a much easier way than its fully supervised counterpart that adopts pixel-level masks [11]. Among these weakly supervised labels, the image-level annotation is the easiest to collect but also the most challenging case since there is no direct mapping between semantic labels and pixels.

To learn semantic segmentation models using image-level labels as supervision, many existing approaches can be categorized as one-step approaches and two-step approaches. One-step approaches [12] often establish an end-to-end framework, which augments multi-instance learning with other constrained strategies for optimization. This family of methods is elegant and easy to implement. However, one significant drawback of these approaches is that the segmentation accuracy is far behind their fully supervised counterparts. To achieve better segmentation performance, many researchers alternatively propose to leverage two-step approaches [13], [14]. This family of approaches usually aims to take bottom-up [15] or top-down [16], [17] strategies to firstly generate high-quality pseudo pixel-level masks with image-level labels as supervision. These pseudo masks then act as ground-truth and are fed into the off-the-shelf fully convolutional networks such as Fully Convolutional Network (FCN) [18] and DeepLab [19], [20] to train the semantic segmentation models. The state-of-the-art methods are mainly two-step approaches, with segmentation performance approaching that of their fully supervised counterparts. However, to produce high-quality pseudo masks, these approaches often employ many bells and whistles, such as introducing additional object/background cues from object proposals [21] or saliency maps [22] in an off-line manner. As a result, the two-step approaches are usually complicated and hard to be re-implemented, limiting their application to research areas such as object localization and video object tracking.

In this paper, we make an extension of our previous work [23], and present a simple yet effective one-step approach, called Reliable Region Mining (RRM), which can be easily trained in an end-to-end manner. It includes two branches: one to produce pseudo pixel-level masks using image-level annotations, and the other to produce the semantic segmentation results. In contrast to the previous two-step methods [8], [24], [25], [26] that prefer to mine dense and integral object regions, our RRM only leverages those reliable object/background regions that are usually tiny but with high response scores on the class activation maps. We find these regions can be further pruned into more reliable ones by augmenting an additional Conditional Random Field (CRF) operation, which is then employed as supervision for the parallel semantic segmentation branch. We design two parallel sub-branches for the segmentation branch: one extracts local information using the regular convolution layer, the other extracts global information with our proposed Re-weighting Feature-Attention Module (R-FAM). More importantly, with limited pixels as supervision, we design a new joint training loss, including a pixel-wise cross entropy loss, a regularized loss named dense energy loss and a Batch-based Class Distance loss (BCD loss) to optimize the training process. We introduce the dense energy loss to use the shallow features such as RGB color and spatial information, and BCD loss to make the high-level semantic features more discriminative for different classes. With the help of the newly designed joint loss and R-FAM, our one-step RRM achieves 65.4% and 65.3% of mIoU scores on the Pascal VOC val and test sets, respectively. These results achieve state-of-the-art performances and they are even competitive compared with those two-step state-of-the-arts, which usually adopt complex bells and whistles to produce pseudo masks. We believe that our proposed RRM offers a new insight to the one-step solution for weakly supervised semantic segmentation. Furthermore, in order to show the effectiveness of our method, we also extend our method to a two-step framework and get a new state-of-the-art performance with 69.3% and 69.2% on the Pascal VOC val and test sets.

Our contributions are summarized as:

•
We design an elegant and efficient end-to-end network for weakly supervised semantic segmentation. Relying on tiny reliable pixel-level pseudo labels, our network can be trained in a one-stage manner given image-level labels, without bells and whistles.
•
We propose two new loss functions for utilizing the reliable labels, including a new dense energy loss and a batch-based class distance (BCD) loss. The former relies on shallow features, whilst the latter focuses on distinguishing high-level semantic features for different classes.
•
We design a new attention module (R-FAM) to extract comprehensive global information. By using a re-weighting technique, our R-FAM can suppress dominant or noisy attention values. Thus our semantic segmentation branch can aggregate sufficient global information.
•
Our end-to-end approach achieves competitive performance (val: 65.4%, test: 65.3%) compared to other two-step approaches on PASCAL VOC 2012 dataset. By extending our network to a two-step solution, our approach achieves a new state-of-the-art performance (val: 69.3%, test: 69.2%).

Section snippets

Related work

Semantic segmentation is an important task in computer vision [27], [28], [29], which requires to predict pixel-level classification. Long et al. [18] proposed the first Fully Convolutional Network (FCNs) for semantic segmentation. Chen et al. [19] proposed a new deep neural network structure named “DeepLab” to conduct pixel-wise prediction using dilated convolution, and a series of new network structures were developed after that [11], [20], [30]. Kim et al. proposed a level set loss for

Overview

Our proposed RRM can be divided into two parallel branches including a classification branch and a semantic segmentation branch. Both branches share the same backbone network, and during training, both of them update the whole network at the same time. The overall framework of our method is illustrated in Fig. 1. The algorithm flow is illustrated in Algorithm 1.

•
The classification branch is used to generate reliable pixel-level annotations. Original CAMs will be processed to generate tiny

Dataset and implementation details

Dataset. Our RRM mdoel is trained and validated on PASCAL VOC 2012 [45] as well as its augmented data, including 10,582 images for training, 1,449 images for validating and 1,456 images for testing. The Mean Intersection over Union (mIoU) is considered as the evaluation criterion.

Implementation Details. The backbone network is a ResNet model with 38 convolution layers [46]. We remove all the fully connected layers of the original network and engage dilated convolution for the last three ResNet

Discussion

There are several possible solutions to improve the current approach: (1) Making the classification branch and segmentation branch benefit from each other. In the current framework, there is no feedback from the segmentation branch to the classification branch. The segmentation branch only receives the reliable label from the classification branch and then makes predictions. Since the quality of the predictions from the segmentation branch is high, we can attempt to use them to refine the

Conclusion

In this paper, we proposed the Reliable Region Mining model, an end-to-end network for image-level weakly supervised semantic segmentation. We revisited drawbacks of the state-of-the-art methods, which adopt the two-step approach. We proposed a one-step approach through mining tiny reliable regions and used them as ground-truth labels directly for our segmentation branch training. With limited pixels as supervision, we designed a dense energy loss and a batch-based class distance loss, which

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the National Key R&D Program of China (No. 2021ZD0112100), and the National Natural Science Foundation of China (Nos. U1936212, 62120106009, 61972323, 61876155).

Bingfeng Zhang received the B.S. degree in electronic information engineering from China University of Petroleum (East China), Qingdao, PR China, in 2015, the M.E. degree in systems, control and signal processing from University of Southampton, Southampton, U.K., in 201 6. He is now a Ph.D student in the University of Liverpool, Liverpool, U.K., and also a Ph.D student in the school of the advanced technology of the Xi’an Jiaotong Liverpool University, Suzhou, PR China. His current research

References (68)

J. Xiao et al.
Ian: the individual aggregation network for person search
Pattern Recognit
(2019)
Z. Wu et al.
Wider or deeper: revisiting the resnet model for visual recognition
Pattern Recognit
(2019)
W. Lu et al.
Boundarymix: generating pseudo-training images for improving segmentation with scribble annotations
Pattern Recognit
(2021)
W. Luo et al.
Weakly-supervised semantic segmentation with saliency and incremental supervision updating
Pattern Recognit
(2021)
D. Lin et al.
Scribblesup: Scribble-supervised convolutional networks for semantic segmentation
CVPR
(2016)
P. Vernaza et al.
Learning random-walk label propagation for weakly-supervised semantic segmentation
CVPR
(2017)
M. Tang et al.
On regularized losses for weakly-supervised cnn segmentation
ECCV
(2018)
J. Dai et al.
Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation
CVPR
(2015)
A. Khoreva et al.
Simple does it: Weakly supervised instance and semantic segmentation
CVPR
(2017)
K.-K. Maninis et al.
Deep extreme cut: From extreme points to object segmentation
CVPR
(2018)

Cited by (22)

Cross-frame feature-saliency mutual reinforcing for weakly supervised video salient object detection
2024, Pattern Recognition
Scribble annotations have recently become popular in video salient object detection. Previous methods only focus on utilizing shallow feature consistency for more integral predictions. However, there is potential for consistency between cross-frame deep features to be used to help regularize saliency predictions better. Besides, we have observed that leveraging saliency predictions as pseudo-supervision signals yields notable improvements in extracting both intra-frame and cross-frame deep features. This, in turn, leads to more precise and detailed object structural information. Thus, we propose a cross-frame feature-saliency mutual reinforcing training process to assist scribble annotations for integral video saliency predictions. Specifically, we design a cross-frame feature regularization head, which leverages intra-frame and cross-frame deep feature consistency to regularize saliency predictions as auxiliary supervision. Then, to help obtain more accurate feature consistency, we design a cross-frame saliency regularization head, where predicted saliency values are used as pseudo-supervision signals to acquire better feature consistency. In this way, our cross-frame feature and saliency regularization heads can benefit from each other to help the network learn more accurately. Extensive experiments show that our method can achieve better performances than the previous best methods. The project is available at https://github.com/muchengxue0911/CFMR.
Prototype Guided Pseudo Labeling and Perturbation-based Active Learning for domain adaptive semantic segmentation
2024, Pattern Recognition
This work aims at active domain adaptation to transfer knowledge from a fully-labeled source domain to an entirely unlabeled target domain. During the active learning period, some pixels in the target domain are selected and annotated as active labels through several selection rounds. Such active labels can improve the target domain model performance greatly. However, existing approaches solely rely on pseudo labels, highly-confident classifier predictions on target images, to train the initial target domain model, resulting in a sub-optimal solution for model training. This initial model will be used for active label selection. Meanwhile, previous methods use entropy-based measurement to select pixels for annotation, which fails to detect high-confidence errors in earlier selection rounds due to the absence of target information.
To address these issues, we propose a prototype-guided pseudo-label generating approach that leverages the relationships between source prototypes and target features. It generates target pseudo labels based on diverse source prototypes, thereby alleviating the issue of classifier predictions. Furthermore, perturbation-based uncertainty measurement, calculating the discrepancy between the target image and the augmented one, is introduced to find the areas with unstable predictions. Extensive experiments demonstrate that our approach outperforms state-of-the-art active domain adaptation methods on two benchmarks, GTAV $\to$ Cityscapes, and SYNTHIA $\to$ Cityscapes. Comparable performance is also achieved when compared to fully-supervised methods.
eX-ViT: A Novel explainable vision transformer for weakly supervised semantic segmentation
2023, Pattern Recognition
Recently vision transformer models have become prominent models for a multitude of vision tasks. These models, however, are usually opaque with weak feature interpretability, making their predictions inaccessible to the users. While there has been a surge of interest in the development of post-hoc solutions that explain model decisions, these methods can not be broadly applied to different transformer architectures, as rules for interpretability have to change accordingly based on the heterogeneity of data and model structures. Moreover, there is no method currently built for an intrinsically interpretable transformer, which is able to explain its reasoning process and provide a faithful explanation. To close these crucial gaps, we propose a novel vision transformer dubbed the eXplainable Vision Transformer (eX-ViT), an intrinsically interpretable transformer model that is able to jointly discover robust interpretable features and perform the prediction. Specifically, eX-ViT is composed of the Explainable Multi-Head Attention (E-MHA) module, the Attribute-guided Explainer (AttE) module with the self-supervised attribute-guided loss. The E-MHA tailors explainable attention weights that are able to learn semantically interpretable representations from tokens in terms of model decisions with noise robustness. Meanwhile, AttE is proposed to encode discriminative attribute features for the target object through diverse attribute discovery, which constitutes faithful evidence for the model predictions. Additionally, we have developed a self-supervised attribute-guided loss for our eX-ViT architecture, which utilizes both the attribute discriminability mechanism and the attribute diversity mechanism to enhance the quality of learned representations. As a result, the proposed eX-ViT model can produce faithful and robust interpretations with a variety of learned attributes. To verify and evaluate our method, we apply the eX-ViT to several weakly supervised semantic segmentation (WSSS) tasks, since these tasks typically rely on accurate visual explanations to extract object localization maps. Particularly, the explanation results obtained via eX-ViT are regarded as pseudo segmentation labels to train WSSS models. Comprehensive simulation results illustrate that our proposed eX-ViT model achieves comparable performance to supervised baselines, while surpassing the accuracy and interpretability of state-of-the-art black-box methods using only image-level labels.
SATS: Self-attention transfer for continual semantic segmentation
2023, Pattern Recognition
Continually learning to segment more and more types of image regions is a desired capability for many intelligent systems. However, such continual semantic segmentation exhibits catastrophic forgetting issues similar to those of continual classification learning. Unlike the existing knowledge distillation strategies for alleviating this problem, transferring a new type of information, namely, the relationships between elements (e.g., pixels) within each image that can capture both within-class and between-class knowledge, is proposed in this study. Such information can be effectively obtained from self-attention maps in a Transformer-style segmentation model. Considering that pixels belonging to the same class in each image typically share similar visual properties, a class-specific region pooling operator is novelly applied to provide reliable relationship information for knowledge transfer. Extensive evaluations on multiple public benchmarks reveal that the proposed self-attention transfer method can effectively alleviate the catastrophic forgetting issue. Furthermore, flexible combinations of the proposed method with widely adopted strategies considerably outperform state-of-the-art solutions.
A multi-strategy contrastive learning framework for weakly supervised semantic segmentation
2023, Pattern Recognition
Weakly supervised semantic segmentation (WSSS) has gained significant popularity as it relies only on weak labels such as image level annotations rather than the pixel level annotations required by supervised semantic segmentation (SSS) methods. Despite drastically reduced annotation costs, typical feature representations learned from WSSS are only representative of some salient parts of objects and less reliable compared to SSS due to the weak guidance during training. In this paper, we propose a novel Multi-Strategy Contrastive Learning (MuSCLe) framework to obtain enhanced feature representations and improve WSSS performance by exploiting similarity and dissimilarity of contrastive sample pairs at image, region, pixel and object boundary levels. Extensive experiments demonstrate the effectiveness of our method and show that MuSCLe outperforms current state-of-the-art methods on the widely used PASCAL VOC 2012 dataset.
Non-bias self-attention learning for weakly supervised semantic segmentation
2023, Computers and Electrical Engineering
The weakly supervised semantic segmentation (WSSS) methods for image-level labels are among the most popular research topics. The current WSSS methods mainly generate the class activation map (CAM) regions so as to create the pseudo segmentation label seed. However, the sparsity of seed leads to less accurate local discriminative regions. Therefore, in this paper a non-bias self-attention learning segmentation network (NBSA) is proposed. First, the non-bias layer is designed to guide the network to expand the discriminative field of CAM during the training process. Second, a fine-grained learning strategy—the information enhancement module (IEM) is introduced to construct the graph convolutional network (GCN) for inter-semantic self-attention learning, which can further improve the generalization ability of the model. Experiments were show that on the PASCAL VOC 2012 dataset to compare the performances of our method with others, the results demonstrates that our method achieves the new state-of-the-art performance.

View all citing articles on Scopus

Jimin Xiao received the B.S. and M.E. degrees in telecommunication engineering from the Nanjing University of Posts and Telecommunications, Nanjing, China, in 2004 and 2007, respectively, and the Ph.D. degree in electrical engineering and electronics from the University of Liverpool, Liverpool, U.K., in 2013. From 2013 to 2014, he was a Senior Researcher with the Department of Signal Processing, Tampere University of Technology, Tampere, Finland, and an External Researcher with the Nokia Research Center, Tampere. Since 2014, he has been a Faculty Member with Xi’an Jiaotong Liverpool University, Suzhou, China. His research interests include image and video processing, computer vision, and deep learning.

Yunchao Wei is currently a Professor at B ei jing Jiaotong University. He received his PhD degree from Beijing Jiaotong University in 2016. Before joining UTS, he was a Postdoc Researcher in Prof. Thomas Huang’s Image Formation and Professing (IFP) group at Beckman Institute, UIUC, from 2017 to 2019. He has published over 60 papers on top tier journals and conferences (e.g., T PAMI, CVPR, ICCV, etc.), Google citation 3900+. He received the Excellent Doctoral Dissertation Award of CIE in 2016, AR C Discovery Early Career Researcher Award in 2019, 1st Prize in Science and Technology awarded by China Society of Image and Graphics in 2019. His research interests mainly include Deep learning and its applications in computer vision, e.g., image classifi cation, video/image object detection/segmentation, and learning with imperfect data. He has organized multiple Workshops and Tutorials in CVPR, ICCV, ECCV and ACM MM.

Kaizhu Huang is currently a Professor at Duke Kunshan University, China. Prof. Huang obtained his PhD degree from Chinese University of Hong Kong (CUHK) in 2004. He worked in Fujitsu Research Centre, CUHK, University of Bristol, National Laboratory of Pattern Recognition, Chinese Academy of Sciences from 2004 to 2012. Prof. Huang ha s been working in machine learning, neural information processing, and pattern recognition. He was the recipient of 2011 Asia Pacific Neural Network Society Young Researcher Award. He received best paper or book award five times. Until September 2020, he has published 9 books and over 190 international research papers (70+ international journals) e.g., in journals (JMLR, Neural Computation, IEEE T PAMI, IEEE T NNLS, IEEE T BME, IEEE TCybernetics) and conferences (NeurIPS, IJCAI, SIGIR, UAI, CIKM, ICDM, ICML, ECML, CVPR). He serves as associated editors/advisory board members in a number of journals and book series. He was invited as keynote speaker in more than 20 international conferences or workshops.

Shan Luo is a Lecturer (Assistant Professor) at the Department of Computer Science, University of Liverpool. Previous to Liverpool, he was a Research Fellow at Harvard University and University of Leeds. He was also a Visiting Scientist at the Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT. He received the B.Eng. degree in Automatic Control from China University of Petroleum, Qingdao, China, in 2012. He was awarded the Ph.D. degree in Robotics from King’s College London, UK, in 2016. His research interests include tactile sensing, object recognition and computer vision.

Yao Zhao received the B.S. degree from the Radio Engineering Department, Fuzhou received the B.S. degree from the Radio Engineering Department, Fuzhou University, Fuzhou, China, in 1989, the M.E. degree from tUniversity, Fuzhou, China, in 1989, the M.E. degree from the Radio Engineering he Radio Engineering Department, Southeast University, Nanjing, China, in 1992, and the Ph.D. degree from the Department, Southeast University, Nanjing, China, in 1992, and the Ph.D. degree from the Institute of Information Science, Beijing Jiaotong University (BJTU), Beijing, China, in 1996, Institute of Information Science, Beijing Jiaotong University (BJTU), Beijing, China, in 1996, where he became an Associate Professor and a Profeswhere he became an Associate Professor and a Professor in 1998 and 2001, respectively. sor in 1998 and 2001, respectively. From 2001 to 2002, he was a Senior Research Fellow with the Information and From 2001 to 2002, he was a Senior Research Fellow with the Information and Communication Theory Group, Faculty of Information Technology and Systems, Delft Communication Theory Group, Faculty of Information Technology and Systems, Delft University of Technology, Delft, The Netherlands. In 2015, he vUniversity of Technology, Delft, The Netherlands. In 2015, he visited the Swiss Federal isited the Swiss Federal Institute of Technology, Lausanne (EPFL), Switzerland. From 2017 to 2018, he visited Institute of Technology, Lausanne (EPFL), Switzerland. From 2017 to 2018, he visited University of Southern California. He is currently the Director with the Institute of University of Southern California. He is currently the Director with the Institute of Information Science, BJTU. His current research interests inInformation Science, BJTU. His current research interests include image/video coding, clude image/video coding, digital watermarking and forensics, video analysis and understanding, and artificial digital watermarking and forensics, video analysis and understanding, and artificial intelligence. Dr. Zhao is a Fellow of the IET. He serves on the Editorial Boards of several intelligence. Dr. Zhao is a Fellow of the IET. He serves on the Editorial Boards of several international journals, including as an Associate Ediinternational journals, including as an Associate Editor for the IEEE TRANSACTIONS ON tor for the IEEE TRANSACTIONS ON CYBERNETICS, a Senior Associate Editor for the IEEE SIGNAL PROCESSING LETTERS, and CYBERNETICS, a Senior Associate Editor for the IEEE SIGNAL PROCESSING LETTERS, and an Area Editor for Signal Processing: Image Communication. He was named a an Area Editor for Signal Processing: Image Communication. He was named a Distinguished Young Scholar by the National Science Foundation of Distinguished Young Scholar by the National Science Foundation of China in 2010 and China in 2010 and was elected as a Chang Jiang Scholar of Ministry of Education of China in 2013.was elected as a Chang Jiang Scholar of Ministry of Education of China in 2013.

View full text

End-to-end weakly supervised semantic segmentation with reliable region mining

Highlights

Abstract

Introduction

Section snippets

Related work

Overview

Dataset and implementation details

Discussion

Conclusion

Declaration of Competing Interest

Acknowledgment

Pattern Recognit

Pattern Recognit

Pattern Recognit

Pattern Recognit

Scribblesup: Scribble-supervised convolutional networks for semantic segmentation

CVPR

Learning random-walk label propagation for weakly-supervised semantic segmentation

CVPR

On regularized losses for weakly-supervised cnn segmentation

ECCV

Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation

CVPR

Simple does it: Weakly supervised instance and semantic segmentation

CVPR

Deep extreme cut: From extreme points to object segmentation

CVPR

Whats the point: Semantic segmentation with point supervision

ECCV

Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation

CVPR

Self-erasing network for integral object attention

NeurIPS

Revisiting dilated convolution: a simple approach for weakly-and semi-supervised semantic segmentation

CVPR

Rethinking atrous convolution for semantic image segmentation

arXiv preprint arXiv:1706.05587

Weakly-and semi-supervised learning of a dcnn for semantic image segmentation

http://arxiv. org/abs/1502

Object region mining with adversarial erasing: a simple classification to semantic segmentation approach

CVPR

Weakly-supervised semantic segmentation network with deep seeded region growing

CVPR

Deeply supervised salient object detection with short connections

CVPR

Top-down neural attention by excitation backprop

IJCV

Learning deep features for discriminative localization

CVPR

Fully convolutional networks for semantic segmentation

CVPR

Semantic image segmentation with deep convolutional nets and fully connected crfs

arXiv preprint arXiv:1412.7062

Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

IEEE transactions on PAMI

From image-level to pixel-level labeling with convolutional networks

CVPR

Salient object detection: a discriminative regional feature integration approach

CVPR

Reliability does matter: An end-to-end weakly supervised semantic segmentation approach

Proceedings of the AAAI Conference on Artificial Intelligence

Ficklenet: weakly and semi-supervised semantic image segmentation using stochastic inference

arXiv preprint arXiv:1902.10421

Self-supervised difference detection for weakly-supervised semantic segmentation

ICCV

Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation

arXiv preprint arXiv:2004.04581

Correlation filter selection for visual tracking using reinforcement learning

arXiv preprint arXiv:1811.03196

Multi-level adversarial network for domain adaptive semantic segmentation

Pattern Recognit

Encoder-decoder with atrous separable convolution for semantic image segmentation

ECCV

Robust visual saliency optimization based on bidirectional markov chains

Cognit Comput