skip to main content
10.1145/3664647.3681132acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning

Published: 28 October 2024 Publication History

Abstract

Camouflaged instance segmentation (CIS) aims to detect and segment objects blending with their surroundings. While existing CIS methods rely heavily on fully-supervised training with massive precisely annotated data, consuming considerable annotation efforts yet struggling to segment highly camouflaged objects accurately. Despite their visual similarity to the background, camouflaged objects differ semantically. Since text associated with images offers explicit semantic cues to underscore this difference, we propose a novel approach: the first Text-Prompt based weakly-supervised camouflaged instance segmentation method named TPNet, leveraging semantic distinctions for effective segmentation. TPNet operates in two stages: pseudo mask generation and a self-training process. In the first stage, we align text prompts with images using a language-image model to obtain region proposals containing camouflaged instances. A Semantic-Spatial Iterative Fusion module is designed to assimilate spatial information with semantic insights, iteratively refining pseudo mask. In the second stage, Graduated Camouflage Learning, a self-training strategy, sequences training from simple to complex images based on camouflage levels, facilitating an effective learning gradient. Through the collaboration of the dual phases, our method offers a comprehensive experiment on two common benchmark and demonstrates a significant advancement, delivering a novel solution that bridges the gap between weak-supervised and high camouflaged instance segmentation.

References

[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
[2]
Amir Bar, XinWang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, and Amir Globerson. 2022. Detreg: Unsupervised pretraining with region priors for object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14605--14615.
[3]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. 41--48.
[4]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6154--6162.
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213--229.
[6]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650--9660.
[7]
Bowen Cheng, Omkar Parkhi, and Alexander Kirillov. 2022. Pointly-supervised instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2617--2626.
[8]
Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Qian Zhang, and Wenyu Liu. 2023. Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3145--3154.
[9]
Runmin Cong, Mengyao Sun, Sanyi Zhang, Xiaofei Zhou, Wei Zhang, and Yao Zhao. 2023. Frequency perception network for camouflaged object detection. In Proceedings of the 31st ACM International Conference on Multimedia. 1179--1189.
[10]
Bo Dong, Jialun Pei, Rongrong Gao, Tian-Zhu Xiang, ShuoWang, and Huan Xiong. 2023. A unified query-based paradigm for camouflaged instance segmentation. In Proceedings of the 31st ACM International Conference on Multimedia. 2131--2138.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[12]
Deng-Ping Fan, Ge-Peng Ji, Guolei Sun, Ming-Ming Cheng, Jianbing Shen, and Ling Shao. 2020. Camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2777--2787.
[13]
Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao. 2020. Pranet: Parallel reverse attention network for polyp segmentation. In International conference on medical image computing and computer-assisted intervention. Springer, 263--273.
[14]
Deng-Ping Fan, Tao Zhou, Ge-Peng Ji, Yi Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao. 2020. Inf-net: Automatic covid-19 lung infection segmentation from ct images. IEEE transactions on medical imaging 39, 8 (2020), 2626--2637.
[15]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[17]
Jian Hu, Jiayi Lin, Weitong Cai, and Shaogang Gong. 2023. Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects. arXiv preprint arXiv:2312.07374 (2023).
[18]
Hala Lamdouar, Weidi Xie, and Andrew Zisserman. 2023. The making and breaking of camouflage. In Proceedings of the IEEE/CVF international conference on computer vision. 832--842.
[19]
Trung-Nghia Le, Yubo Cao, Tan-Cong Nguyen, Minh-Quan Le, Khanh-Duy Nguyen, Thanh-Toan Do, Minh-Triet Tran, and Tam V Nguyen. 2021. Camouflaged instance segmentation in-the-wild: Dataset, method, and benchmark suite. IEEE Transactions on Image Processing 31 (2021), 287--300.
[20]
Aixuan Li, Jing Zhang, Yunqiu Lv, Bowen Liu, Tong Zhang, and Yuchao Dai. 2021. Uncertainty-aware joint salient object and camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10071--10081.
[21]
Lin Li, Jingyi Liu, Shuo Wang, Xunkun Wang, and Tian-Zhu Xiang. 2022. Trichomonas vaginalis segmentation in microscope images. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 68--78.
[22]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6--12, 2014, Proceedings, Part V 13. Springer, 740-- 755.
[23]
Yuqi Lin, Minghao Chen,WenxiaoWang, BoxiWu, Ke Li, Binbin Lin, Haifeng Liu, and Xiaofei He. 2023. Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15305--15314.
[24]
Yanxin Long, Jianhua Han, Runhui Huang, Hang Xu, Yi Zhu, Chunjing Xu, and Xiaodan Liang. 2023. Fine-Grained Visual--Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection. IEEE Transactions on Neural Networks and Learning Systems (2023).
[25]
Naisong Luo, Yuwen Pan, Rui Sun, Tianzhu Zhang, Zhiwei Xiong, and Feng Wu. 2023. Camouflaged instance segmentation via explicit de-camouflaging. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 17918--17927.
[26]
Qiwu Luo, Ben Li, Jiaojiao Su, Chunhua Yang, Weihua Gui, Olli Silven, and Li Liu. 2023. CDDNet: Camouflaged Defect Detection Network for Steel Surface. IEEE Transactions on Instrumentation and Measurement (2023).
[27]
Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Bowen Liu, Nick Barnes, and Deng-Ping Fan. 2021. Simultaneously localize, segment and rank the camouflaged objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11591--11601.
[28]
Mingcan Ma, Changqun Xia, and Jia Li. 2021. Pyramidal feature shrinking for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 2311--2318.
[29]
Mingcan Ma, Changqun Xia, Chenxi Xie, Xiaowu Chen, and Jia Li. 2023. Boosting broader receptive fields for salient object detection. IEEE Transactions on Image Processing 32 (2023), 1026--1038.
[30]
Melia G Nafus, Jennifer M Germano, Jeanette A Perry, Brian D Todd, Allyson Walsh, and Ronald R Swaisgood. 2015. Hiding in plain sight: a study on camouflage and habitat selection in a slow-moving desert herbivore. Behavioral Ecology 26, 5 (2015), 1389--1394.
[31]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[32]
Jialun Pei, Tianyang Cheng, Deng-Ping Fan, He Tang, Chuanbo Chen, and Luc Van Gool. 2022. Osformer: One-stage camouflaged instance segmentation with transformers. In European Conference on Computer Vision. Springer, 19--37.
[33]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[34]
Shenghai Rong, Bohai Tu, Zilei Wang, and Junjie Li. 2023. Boundary-enhanced co-training for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19574--19584.
[35]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[36]
Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22, 8 (2000), 888-- 905.
[37]
Oriane Siméoni, Gilles Puy, Huy V Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick Pérez, Renaud Marlet, and Jean Ponce. 2021. Localizing objects with self-supervised transformers and no labels. arXiv preprint arXiv:2109.14279 (2021).
[38]
Zhi Tian, Chunhua Shen, Xinlong Wang, and Hao Chen. 2021. Boxinst: Highperformance instance segmentation with box annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5443--5452.
[39]
Wouter Van Gansbeke, Simon Vandenhende, and Luc Van Gool. 2022. Discovering object masks with transformers for unsupervised semantic segmentation. arXiv preprint arXiv:2206.06363 (2022).
[40]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[41]
Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A survey on curriculum learning. IEEE transactions on pattern analysis and machine intelligence 44, 9 (2021), 4555--4576.
[42]
Xudong Wang, Rohit Girdhar, Stella X Yu, and Ishan Misra. 2023. Cut and learn for unsupervised object detection and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3124--3134.
[43]
Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, and Jose M Alvarez. 2022. Freesolo: Learning to segment objects without annotations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14176--14186.
[44]
Yuting Wang, Velibor Ilic, Jiatong Li, Branislav Kisa?anin, and Vladimir Pavlovic. 2023. ALWOD: Active learning for weakly-supervised object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6459--6469.
[45]
YangtaoWang, Xi Shen, Yuan Yuan, Yuming Du, Maomao Li, Shell Xu Hu, James L Crowley, and Dominique Vaufreydaz. 2023. Tokencut: Segmenting objects in images and videos with self-supervised transformer and normalized cut. IEEE transactions on pattern analysis and machine intelligence (2023).
[46]
Fanrong Xiao, Canchao Yang, Haitao Shi, Jichao Wang, Liang Sun, and Liu Lin. 2016. Background matching and camouflage efficiency predict population density in four-eyed turtle (Sacalia quadriocellata). Behavioural Processes 131 (2016), 40-- 46.
[47]
Chenxi Xie, Changqun Xia, Tianshu Yu, and Jia Li. 2023. Frequency representation integration for camouflaged object detection. In Proceedings of the 31st ACM International Conference on Multimedia. 1789--1797.
[48]
Jinyu Yang, Mingqi Gao, Feng Zheng, Xiantong Zhen, Rongrong Ji, Ling Shao, and Ale? Leonardis. 2024. Weakly-Supervised RGBD Video Object Segmentation. IEEE Transactions on Image Processing (2024).
[49]
Yufei Yin, Jiajun Deng, Wengang Zhou, Li Li, and Houqiang Li. 2023. Cyclic- Bootstrap Labeling for Weakly Supervised Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7008--7018.
[50]
Yi Zhang and ChengyiWu. 2023. Unsupervised camouflaged object segmentation as domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4334--4344.
[51]
Zhirui Zhao, Changqun Xia, Chenxi Xie, and Jia Li. 2021. Complementary trilateral decoder for fast and accurate salient object detection. In Proceedings of the 29th acm international conference on multimedia. 4967--4975.
[52]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.
[53]
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020).

Index Terms

  1. Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Check for updates

    Author Tags

    1. camouflaged instance segmentation
    2. text-prompt
    3. weakly-supervised

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 264
      Total Downloads
    • Downloads (Last 12 months)264
    • Downloads (Last 6 weeks)85
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media