CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

Authors

  • Junao Shen School of Software Technology, Zhejiang University
  • Kun Kuang College of Computer Science and Technology, Zhejiang University
  • Jiaheng Wang School of Software Technology, Zhejiang University
  • Xinyu Wang School of Software Technology, Zhejiang University
  • Tian Feng School of Software Technology, Zhejiang University
  • Wei Zhang School of Software Technology, Zhejiang University Innovation Center of Yangtze River Delta, Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v38i5.28280

Keywords:

CV: Segmentation, CV: Bias, Fairness & Privacy, CV: Image and Video Retrieval, CV: Vision for Robotics & Autonomous Driving

Abstract

Few-shot semantic segmentation (FSS) aims to segment unseen objects in a query image using a few pixel-wise annotated support images, thus expanding the capabilities of semantic segmentation. The main challenge lies in extracting sufficient information from the limited support images to guide the segmentation process. Conventional methods typically address this problem by generating single or multiple prototypes from the support images and calculating their cosine similarity to the query image. However, these methods often fail to capture meaningful information for modeling the de facto joint distribution of pixel and category. Consequently, they result in incomplete segmentation of foreground objects and mis-segmentation of the complex background. To overcome this issue, we propose the Cross Gaussian Mixture Generative Model (CGMGM), a novel Gaussian Mixture Models~(GMMs)-based FSS method, which establishes the joint distribution of pixel and category in both the support and query images. Specifically, our method initially matches the feature representations of the query image with those of the support images to generate and refine an initial segmentation mask. It then employs GMMs to accurately model the joint distribution of foreground and background using the support masks and the initial segmentation mask. Subsequently, a parametric decoder utilizes the posterior probability of pixels in the query image, by applying the Bayesian theorem, to the joint distribution, to generate the final segmentation mask. Experimental results on PASCAL-5i and COCO-20i datasets demonstrate our CGMGM's effectiveness and superior performance compared to the state-of-the-art methods.

Published

2024-03-24

How to Cite

Shen, J., Kuang, K., Wang, J., Wang, X., Feng, T., & Zhang, W. (2024). CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4784-4792. https://doi.org/10.1609/aaai.v38i5.28280

Issue

Section

AAAI Technical Track on Computer Vision IV