CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

Junao Shen; Kun Kuang; Jiaheng Wang; Xinyu Wang; Tian Feng; Wei Zhang

doi:10.1609/aaai.v38i5.28280

Authors

Junao Shen School of Software Technology, Zhejiang University
Kun Kuang College of Computer Science and Technology, Zhejiang University
Jiaheng Wang School of Software Technology, Zhejiang University
Xinyu Wang School of Software Technology, Zhejiang University
Tian Feng School of Software Technology, Zhejiang University
Wei Zhang School of Software Technology, Zhejiang University Innovation Center of Yangtze River Delta, Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v38i5.28280

Keywords:

CV: Segmentation, CV: Bias, Fairness & Privacy, CV: Image and Video Retrieval, CV: Vision for Robotics & Autonomous Driving

Abstract

Few-shot semantic segmentation (FSS) aims to segment unseen objects in a query image using a few pixel-wise annotated support images, thus expanding the capabilities of semantic segmentation. The main challenge lies in extracting sufficient information from the limited support images to guide the segmentation process. Conventional methods typically address this problem by generating single or multiple prototypes from the support images and calculating their cosine similarity to the query image. However, these methods often fail to capture meaningful information for modeling the de facto joint distribution of pixel and category. Consequently, they result in incomplete segmentation of foreground objects and mis-segmentation of the complex background. To overcome this issue, we propose the Cross Gaussian Mixture Generative Model (CGMGM), a novel Gaussian Mixture Models~(GMMs)-based FSS method, which establishes the joint distribution of pixel and category in both the support and query images. Specifically, our method initially matches the feature representations of the query image with those of the support images to generate and refine an initial segmentation mask. It then employs GMMs to accurately model the joint distribution of foreground and background using the support masks and the initial segmentation mask. Subsequently, a parametric decoder utilizes the posterior probability of pixels in the query image, by applying the Bayesian theorem, to the joint distribution, to generate the final segmentation mask. Experimental results on PASCAL-5i and COCO-20i datasets demonstrate our CGMGM's effectiveness and superior performance compared to the state-of-the-art methods.

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription