skip to main content
10.1145/3394171.3413944acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning

Authors Info & Claims
Published:12 October 2020Publication History

ABSTRACT

Recent progress in few-shot segmentation usually aims at performing novel object segmentation using a few annotated examples as guidance. In this work, we advance this few-shot segmentation paradigm towards a more challenging yet general scenario, i.e., Generalized Few-shot Scene Parsing (GFSP). In this task, we take a fully annotated image as guidance to segment all pixels in a query image. Our mission is to study a generalizable and robust segmentation network from the meta-learning perspective so that both seen and unseen categories can be correctly recognized. Different from previous practices, this task performs segmentation on a joint label space consisting of both previously seen and novel categories. Moreover, pixels from these multiple categories need to be simultaneously taken into account, which is actually not well explored before. Accordingly, we present Meta Parsing Networks (MPNet) to better exploit the guidance information in the support set. Our MPNet contains two basic modules, i.e., the Adaptive Deep Metric Learning (ADML) module and the Contrastive Inter-class Distraction (CID) module. Specially, the ADML takes the annotated pixels from the support image as the guidance and adaptively produces high-quality prototypes for learning a deep comparison metric. In addition, MPNet further introduces the CID module learning to enlarge the feature discrepancy of different categories in the embedding space, leading the MPNet to generate more discriminative feature embeddings. We conduct experiments on two newly constructed benchmarks, i.e., GFSP-Cityscapes and GFSP-Pascal-Context. Extensive ablation studies well demonstrate the effectiveness and generalization ability of our MPNet.

Skip Supplemental Material Section

Supplemental Material

3394171.3413944.mp4

mp4

58.3 MB

References

  1. Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).Google ScholarGoogle Scholar
  2. Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV). 801--818.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas S Huang, Wen-Mei Hwu, and Honghui Shi. 2019. SPGNet: Semantic Prediction Guidance for Scene Parsing. In IEEE International Conference on Computer Vision (ICCV). 5218--5228.Google ScholarGoogle Scholar
  4. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3213--3223.Google ScholarGoogle ScholarCross RefCross Ref
  5. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  6. Nanqing Dong and Eric Xing. 2018. Few-Shot Semantic Segmentation with Prototype Learning. In The British Machine Vision Conference (BMVC), Vol. 3.Google ScholarGoogle Scholar
  7. Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, Vol. 88, 2 (2010), 303--338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Qianyu Feng, Guoliang Kang, Hehe Fan, and Yi Yang. 2019 a. Attract or Distract: Exploit the Margin of Open Set. In Proceedings of the IEEE International Conference on Computer Vision. 7990--7999.Google ScholarGoogle ScholarCross RefCross Ref
  9. Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Mingliang Xu, and Yi Yang. 2020. Cascaded revision network for novel object captioning. IEEE Transactions on Circuits and Systems for Video Technology (2020).Google ScholarGoogle Scholar
  10. Qianyu Feng, Zongxin Yang, Peike Li, Yunchao Wei, and Yi Yang. 2019 b. Dual embedding learning for video instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  11. Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (ICML). 1126--1135.Google ScholarGoogle Scholar
  12. Arthur Gretton, Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. 2007. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems (NeurIPS). 513--520.Google ScholarGoogle Scholar
  13. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In IEEE International Conference on Computer Vision (ICCV). 2961--2969.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  15. Tao Hu, Pengwan Yang, Chiliang Zhang, Gang Yu, Yadong Mu, and Cees GM Snoek. 2019. Attention-based multi-context guiding for few-shot semantic segmentation. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 33. 8441--8448.Google ScholarGoogle ScholarCross RefCross Ref
  16. Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In IEEE International Conference on Computer Vision (ICCV). 1501--1510.Google ScholarGoogle ScholarCross RefCross Ref
  17. Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. Ccnet: Criss-cross attention for semantic segmentation. In IEEE International Conference on Computer Vision (ICCV). 603--612.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jianbo Jiao, Yunchao Wei, Zequn Jie, Honghui Shi, Rynson WH Lau, and Thomas S Huang. 2019. Geometry-aware distillation for indoor semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2869--2878.Google ScholarGoogle ScholarCross RefCross Ref
  19. Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Lille.Google ScholarGoogle Scholar
  20. Peike Li, Xuanyi Dong, Xin Yu, and Yi Yang. 2020 a. When Humans Meet Machines: Towards Efficient Segmentation Networks. Proceedings of the British Machine Vision Conference (BMVC) (2020).Google ScholarGoogle Scholar
  21. Peike Li, Pingbo Pan, Ping Liu, Mingliang Xu, and Yi Yang. 2020 b. Hierarchical Temporal Modeling with Mutual Distance Matching for Video Based Person Re-Identification. IEEE Transactions on Circuits and Systems for Video Technology (2020).Google ScholarGoogle Scholar
  22. Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2019. Self-Correction for Human Parsing. arXiv preprint arXiv:1910.09777 (2019).Google ScholarGoogle Scholar
  23. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV). 740--755.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  25. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  26. Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. The Role of Context for Object Detection and Semantic Segmentation in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  27. Khoi Nguyen and Sinisa Todorovic. 2019. Feature weighting and boosting for few-shot segmentation. In IEEE International Conference on Computer Vision (ICCV). 622--631.Google ScholarGoogle ScholarCross RefCross Ref
  28. Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alyosha Efros, and Sergey Levine. 2018. Conditional networks for few-shot semantic segmentation. ICLR Workshop.Google ScholarGoogle Scholar
  29. Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, and Byron Boots. 2017. One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410 (2017).Google ScholarGoogle Scholar
  30. Mennatullah Siam, Boris N Oreshkin, and Martin Jagersand. 2019. AMP: Adaptive masked proxies for few-shot segmentation. In IEEE International Conference on Computer Vision (ICCV). 5249--5258.Google ScholarGoogle ScholarCross RefCross Ref
  31. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision (ECCV). 746--760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS). 4077--4087.Google ScholarGoogle Scholar
  33. Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 567--576.Google ScholarGoogle ScholarCross RefCross Ref
  34. Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1199--1208.Google ScholarGoogle ScholarCross RefCross Ref
  35. Pinzhuo Tian, Zhangkai Wu, Lei Qi, Lei Wang, Yinghuan Shi, and Yang Gao. 2020. Differentiable Meta-learning Model for Few-shot Semantic Segmentation. In AAAI Conference on Artificial Intelligence (AAAI).Google ScholarGoogle Scholar
  36. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998--6008.Google ScholarGoogle Scholar
  37. Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems (NeurIPS). 3630--3638.Google ScholarGoogle Scholar
  38. Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jiashi Feng. 2019. Panet: Few-shot image semantic segmentation with prototype alignment. In IEEE International Conference on Computer Vision (ICCV). 9197--9206.Google ScholarGoogle ScholarCross RefCross Ref
  39. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7794--7803.Google ScholarGoogle ScholarCross RefCross Ref
  40. Zongxin Yang, Peike Li, Qianyu Feng, Yunchao Wei, and Yi Yang. 2019. Going deeper into embedding learning for video object segmentation. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 0--0.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yuhui Yuan and Jingdong Wang. 2018. Ocnet: Object context network for scene parsing. arXiv preprint arXiv:1809.00916 (2018).Google ScholarGoogle Scholar
  42. Chi Zhang, Guosheng Lin, Fayao Liu, Jiushuang Guo, Qingyao Wu, and Rui Yao. 2019 a. Pyramid Graph Networks with Connection Attentions for Region-Based One-Shot Semantic Segmentation. In IEEE International Conference on Computer Vision (ICCV). 9587--9595.Google ScholarGoogle Scholar
  43. Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua Shen. 2019 b. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5217--5226.Google ScholarGoogle ScholarCross RefCross Ref
  44. Xiaolin Zhang, Yunchao Wei, Yi Yang, and Thomas Huang. 2018. Sg-one: Similarity guidance network for one-shot semantic segmentation. arXiv preprint arXiv:1810.09091 (2018).Google ScholarGoogle Scholar
  45. Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2881--2890.Google ScholarGoogle ScholarCross RefCross Ref
  46. Zhedong Zheng, Yunchao Wei, and Yi Yang. 2020. University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization. In Proceedings of the 28th ACM international conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 633--641.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Meta Parsing Networks: Towards Generalized Few-shot Scene Parsing with Adaptive Metric Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '20: Proceedings of the 28th ACM International Conference on Multimedia
      October 2020
      4889 pages
      ISBN:9781450379885
      DOI:10.1145/3394171

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader