skip to main content
10.1145/3652583.3658073acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

NeurNCD: Novel Class Discovery via Implicit Neural Representation

Published: 07 June 2024 Publication History

Abstract

Discovering novel classes in open-world settings is crucial for real-world applications. Traditional explicit representations, such as object descriptors or 3D segmentation maps, are constrained by their discrete, hole-prone, and noisy nature, which hinders accurate novel class discovery. To address these challenges, we introduce NeurNCD, the first versatile and data-efficient framework for novel class discovery that employs the meticulously designed Embedding-NeRF model combined with KL divergence as a substitute for traditional explicit 3D segmentation maps to aggregate semantic embedding and entropy in visual embedding space. NeurNCD also integrates several key components, including feature query, feature modulation and clustering, facilitating efficient feature augmentation and information exchange between the pre-trained semantic segmentation network and implicit neural representations. As a result, our framework achieves superior segmentation performance in both open and closed-world settings without relying on densely labelled datasets for supervised training or human interaction to generate sparse label supervision. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on the NYUv2 and Replica datasets.

References

[1]
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. 2022. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5470--5479.
[2]
Hermann Blum, Marcus G Müller, Abel Gawel, Roland Siegwart, and Cesar Cadena. 2023. SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding. In Robotics Research. Springer, 119--135.
[3]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801--818.
[4]
Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, and Gang Zeng. 2020. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XI. Springer, 561--577.
[5]
Zheng Chen, Chen Wang, Yuan-Chen Guo, and Song-Hai Zhang. 2022. Struct-NeRF: Neural Radiance Fields for Indoor Scenes with Structural Hints. arXiv preprint arXiv:2209.05277 (2022).
[6]
Camille Couprie, Clément Farabet, Laurent Najman, and Yann LeCun. 2013. Indoor semantic segmentation using depth information. (2013).
[7]
Jonas Frey, Hermann Blum, Francesco Milano, Roland Siegwart, and Cesar Cadena. 2022. Continual Adaptation of Semantic Segmentation Using Complementary 2D-3D Data Representations. IEEE Robotics and Automation Letters 7, 4 (2022), 11665--11672. https://doi.org/10.1109/LRA.2022.3203812
[8]
Xiao Fu, Shangzhan Zhang, Tianrun Chen, Yichong Lu, Lanyun Zhu, Xiaowei Zhou, Andreas Geiger, and Yiyi Liao. 2022. Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation. arXiv preprint arXiv:2203.15224 (2022).
[9]
Fadri Furrer, Tonci Novkovic, Marius Fehr, Abel Gawel, Margarita Grinvald, Torsten Sattler, Roland Siegwart, and Juan Nieto. 2018. Incremental object data-base: Building 3d models from multiple partial observations. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 6835--6842.
[10]
Haoyu Guo, Sida Peng, Haotong Lin, Qianqian Wang, Guofeng Zhang, Hujun Bao, and Xiaowei Zhou. 2022. Neural 3D Scene Reconstruction with the Manhattan-world Assumption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5511--5520.
[11]
Saurabh Gupta, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision 112, 2 (2015), 133--149.
[12]
Greg Hamerly and Charles Elkan. 2003. Learning the k in k-means. Advances in neural information processing systems 16 (2003).
[13]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. (2014).
[14]
Zechao Li, Yanpeng Sun, Liyan Zhang, and Jinhui Tang. 2021. CTNet: Context-based tandem network for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[15]
Huayao Liu, Jiaming Zhang, Kailun Yang, Xinxin Hu, and Rainer Stiefelhagen. 2022. CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers. arXiv preprint arXiv:2203.04838 (2022).
[16]
Yun Liu, Peng-Tao Jiang, Vahan Petrosyan, Shi-Jie Li, Jiawang Bian, Le Zhang 0001, and Ming-Ming Cheng. 2018. Del: Deep embedding learning for efficient image segmentation. In IJCAI, Vol. 864. 870.
[17]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99--106.
[18]
Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989 (2022).
[19]
Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, and Kris Kitani. 2019. Incremental class discovery for semantic segmentation with RGBD sensing. In Proceedings of the IEEE/CVF international conference on computer vision. 972--981.
[20]
Yoshikatsu Nakajima, Keisuke Tateno, Federico Tombari, and Hideo Saito. 2018. Fast and accurate semantic mapping through geometric-based incremental segmentation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 385--392.
[21]
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. 2019. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9339--9347.
[22]
Daniel Seichter, Mona Köhler, Benjamin Lewandowski, Tim Wengefeld, and Horst-Michael Gross. 2021. Efficient rgb-d semantic segmentation for indoor scene analysis. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13525--13531.
[23]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European conference on computer vision. Springer, 746--760.
[24]
Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition. 567--576.
[25]
Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, et al. 2019. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019).
[26]
Zhiqiang Tao, Hongfu Liu, Huazhu Fu, and Yun Fu. 2019. Multi-view saliency-guided clustering for image cosegmentation. IEEE Transactions on Image Processing 28, 9 (2019), 4634--4645.
[27]
Keisuke Tateno, Federico Tombari, and Nassir Navab. 2015. Real-time and scalable incremental segmentation on dense slam. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4465--4472.
[28]
Haithem Turki, Deva Ramanan, and Mahadev Satyanarayanan. 2022. Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12922--12931.
[29]
Han Vanholder. 2016. Efficient inference with tensorrt. In GPU Technology Conference, Vol. 1. 2.
[30]
Suhani Vora, Noha Radwan, Klaus Greff, Henning Meyer, Kyle Genova, Mehdi SM Sajjadi, Etienne Pot, Andrea Tagliasacchi, and Daniel Duckworth. 2021. Nesf: Neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv preprint arXiv:2111.13260 (2021).
[31]
Huan Wang, Jian Ren, Zeng Huang, Kyle Olszewski, Menglei Chai, Yun Fu, and Sergey Tulyakov. 2022. R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis. arXiv preprint arXiv:2203.17261 (2022).
[32]
Junming Wang, Zekai Sun, Xiuxian Guan, Tianxiang Shen, Zongyuan Zhang, Tianyang Duan, Dong Huang, Shixiong Zhao, and Heming Cui. 2024. AGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone Environments. arXiv preprint arXiv:2403.11607 (2024).
[33]
Xiaoyang Wang, Jimin Xiao, Bingfeng Zhang, and Limin Yu. 2022. CARD: Semi-supervised semantic segmentation via class-agnostic relation based denoising. In Proc. IJCAI. 1451--1457.
[34]
Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, and Yunhe Wang. 2022. Multimodal token fusion for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12186--12195.
[35]
Zijin Wu, Xingyi Li, Juewen Peng, Hao Lu, Zhiguo Cao, and Weicai Zhong. 2022. DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields. In Proceedings of the 30th ACM International Conference on Multimedia. 1718--1729.
[36]
Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. 2022. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5438--5448.
[37]
Rui Xu and Donald Wunsch. 2005. Survey of clustering algorithms. IEEE Transactions on neural networks 16, 3 (2005), 645--678.
[38]
Hong-Ming Yang, Xu-Yao Zhang, Fei Yin, Qing Yang, and Cheng-Lin Liu. 2020. Convolutional prototype network for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 5 (2020), 2358--2370.
[39]
Minxiang Ye, Yifei Zhang, Shiqiang Zhu, Anhuan Xie, and Dan Zhang. 2022. Deep Markov Clustering for Panoptic Segmentation. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2380--2384.
[40]
Yuyang Zhao, Zhun Zhong, Nicu Sebe, and Gim Hee Lee. 2022. Novel Class Discovery in Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4340--4349.
[41]
Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, and Andrew J Davison. 2021. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15838--15847.
[42]
Shuaifeng Zhi, Edgar Sucar, Andre Mouton, Iain Haughton, Tristan Laidlow, and Andrew J Davison. 2022. iLabel: Revealing Objects in Neural Fields. IEEE Robotics and Automation Letters (2022).
[43]
Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. 2018. Open3D: A modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018).

Index Terms

  1. NeurNCD: Novel Class Discovery via Implicit Neural Representation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
    May 2024
    1379 pages
    ISBN:9798400706196
    DOI:10.1145/3652583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. feature fusion
    2. neural radiation field
    3. novel class discovery
    4. novel view synthesis
    5. visual embedding space

    Qualifiers

    • Research-article

    Conference

    ICMR '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 77
      Total Downloads
    • Downloads (Last 12 months)77
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media