research-article

NeurNCD: Novel Class Discovery via Implicit Neural Representation

Authors:

Yi ShiAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 257 - 265

https://doi.org/10.1145/3652583.3658073

Published: 07 June 2024 Publication History

Abstract

Discovering novel classes in open-world settings is crucial for real-world applications. Traditional explicit representations, such as object descriptors or 3D segmentation maps, are constrained by their discrete, hole-prone, and noisy nature, which hinders accurate novel class discovery. To address these challenges, we introduce NeurNCD, the first versatile and data-efficient framework for novel class discovery that employs the meticulously designed Embedding-NeRF model combined with KL divergence as a substitute for traditional explicit 3D segmentation maps to aggregate semantic embedding and entropy in visual embedding space. NeurNCD also integrates several key components, including feature query, feature modulation and clustering, facilitating efficient feature augmentation and information exchange between the pre-trained semantic segmentation network and implicit neural representations. As a result, our framework achieves superior segmentation performance in both open and closed-world settings without relying on densely labelled datasets for supervised training or human interaction to generate sparse label supervision. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches on the NYUv2 and Replica datasets.

References

[1]

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. 2022. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5470--5479.

[2]

Hermann Blum, Marcus G Müller, Abel Gawel, Roland Siegwart, and Cesar Cadena. 2023. SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding. In Robotics Research. Springer, 119--135.

[3]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801--818.

Digital Library

[4]

Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu, Chen Qian, Hongsheng Li, and Gang Zeng. 2020. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XI. Springer, 561--577.

[5]

Zheng Chen, Chen Wang, Yuan-Chen Guo, and Song-Hai Zhang. 2022. Struct-NeRF: Neural Radiance Fields for Indoor Scenes with Structural Hints. arXiv preprint arXiv:2209.05277 (2022).

[6]

Camille Couprie, Clément Farabet, Laurent Najman, and Yann LeCun. 2013. Indoor semantic segmentation using depth information. (2013).

[7]

Jonas Frey, Hermann Blum, Francesco Milano, Roland Siegwart, and Cesar Cadena. 2022. Continual Adaptation of Semantic Segmentation Using Complementary 2D-3D Data Representations. IEEE Robotics and Automation Letters 7, 4 (2022), 11665--11672. https://doi.org/10.1109/LRA.2022.3203812

[8]

Xiao Fu, Shangzhan Zhang, Tianrun Chen, Yichong Lu, Lanyun Zhu, Xiaowei Zhou, Andreas Geiger, and Yiyi Liao. 2022. Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation. arXiv preprint arXiv:2203.15224 (2022).

[9]

Fadri Furrer, Tonci Novkovic, Marius Fehr, Abel Gawel, Margarita Grinvald, Torsten Sattler, Roland Siegwart, and Juan Nieto. 2018. Incremental object data-base: Building 3d models from multiple partial observations. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 6835--6842.

[10]

Haoyu Guo, Sida Peng, Haotong Lin, Qianqian Wang, Guofeng Zhang, Hujun Bao, and Xiaowei Zhou. 2022. Neural 3D Scene Reconstruction with the Manhattan-world Assumption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5511--5520.

[11]

Saurabh Gupta, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision 112, 2 (2015), 133--149.

Digital Library

[12]

Greg Hamerly and Charles Elkan. 2003. Learning the k in k-means. Advances in neural information processing systems 16 (2003).

[13]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. (2014).

[14]

Zechao Li, Yanpeng Sun, Liyan Zhang, and Jinhui Tang. 2021. CTNet: Context-based tandem network for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[15]

Huayao Liu, Jiaming Zhang, Kailun Yang, Xinxin Hu, and Rainer Stiefelhagen. 2022. CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers. arXiv preprint arXiv:2203.04838 (2022).

[16]

Yun Liu, Peng-Tao Jiang, Vahan Petrosyan, Shi-Jie Li, Jiawang Bian, Le Zhang 0001, and Ming-Ming Cheng. 2018. Del: Deep embedding learning for efficient image segmentation. In IJCAI, Vol. 864. 870.

[17]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99--106.

Digital Library

[18]

Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989 (2022).

[19]

Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, and Kris Kitani. 2019. Incremental class discovery for semantic segmentation with RGBD sensing. In Proceedings of the IEEE/CVF international conference on computer vision. 972--981.

[20]

Yoshikatsu Nakajima, Keisuke Tateno, Federico Tombari, and Hideo Saito. 2018. Fast and accurate semantic mapping through geometric-based incremental segmentation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 385--392.

Digital Library

[21]

Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. 2019. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9339--9347.

[22]

Daniel Seichter, Mona Köhler, Benjamin Lewandowski, Tim Wengefeld, and Horst-Michael Gross. 2021. Efficient rgb-d semantic segmentation for indoor scene analysis. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 13525--13531.

Digital Library

[23]

Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from rgbd images. In European conference on computer vision. Springer, 746--760.

Digital Library

[24]

Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition. 567--576.

[25]

Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, et al. 2019. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019).

[26]

Zhiqiang Tao, Hongfu Liu, Huazhu Fu, and Yun Fu. 2019. Multi-view saliency-guided clustering for image cosegmentation. IEEE Transactions on Image Processing 28, 9 (2019), 4634--4645.

Digital Library

[27]

Keisuke Tateno, Federico Tombari, and Nassir Navab. 2015. Real-time and scalable incremental segmentation on dense slam. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4465--4472.

Digital Library

[28]

Haithem Turki, Deva Ramanan, and Mahadev Satyanarayanan. 2022. Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12922--12931.

[29]

Han Vanholder. 2016. Efficient inference with tensorrt. In GPU Technology Conference, Vol. 1. 2.

[30]

Suhani Vora, Noha Radwan, Klaus Greff, Henning Meyer, Kyle Genova, Mehdi SM Sajjadi, Etienne Pot, Andrea Tagliasacchi, and Daniel Duckworth. 2021. Nesf: Neural semantic fields for generalizable semantic segmentation of 3d scenes. arXiv preprint arXiv:2111.13260 (2021).

[31]

Huan Wang, Jian Ren, Zeng Huang, Kyle Olszewski, Menglei Chai, Yun Fu, and Sergey Tulyakov. 2022. R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis. arXiv preprint arXiv:2203.17261 (2022).

[32]

Junming Wang, Zekai Sun, Xiuxian Guan, Tianxiang Shen, Zongyuan Zhang, Tianyang Duan, Dong Huang, Shixiong Zhao, and Heming Cui. 2024. AGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone Environments. arXiv preprint arXiv:2403.11607 (2024).

[33]

Xiaoyang Wang, Jimin Xiao, Bingfeng Zhang, and Limin Yu. 2022. CARD: Semi-supervised semantic segmentation via class-agnostic relation based denoising. In Proc. IJCAI. 1451--1457.

[34]

Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, and Yunhe Wang. 2022. Multimodal token fusion for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12186--12195.

[35]

Zijin Wu, Xingyi Li, Juewen Peng, Hao Lu, Zhiguo Cao, and Weicai Zhong. 2022. DoF-NeRF: Depth-of-Field Meets Neural Radiance Fields. In Proceedings of the 30th ACM International Conference on Multimedia. 1718--1729.

Digital Library

[36]

Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. 2022. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5438--5448.

[37]

Rui Xu and Donald Wunsch. 2005. Survey of clustering algorithms. IEEE Transactions on neural networks 16, 3 (2005), 645--678.

Digital Library

[38]

Hong-Ming Yang, Xu-Yao Zhang, Fei Yin, Qing Yang, and Cheng-Lin Liu. 2020. Convolutional prototype network for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 5 (2020), 2358--2370.

[39]

Minxiang Ye, Yifei Zhang, Shiqiang Zhu, Anhuan Xie, and Dan Zhang. 2022. Deep Markov Clustering for Panoptic Segmentation. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2380--2384.

[40]

Yuyang Zhao, Zhun Zhong, Nicu Sebe, and Gim Hee Lee. 2022. Novel Class Discovery in Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4340--4349.

[41]

Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, and Andrew J Davison. 2021. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15838--15847.

[42]

Shuaifeng Zhi, Edgar Sucar, Andre Mouton, Iain Haughton, Tristan Laidlow, and Andrew J Davison. 2022. iLabel: Revealing Objects in Neural Fields. IEEE Robotics and Automation Letters (2022).

[43]

Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. 2018. Open3D: A modern library for 3D data processing. arXiv preprint arXiv:1801.09847 (2018).

Index Terms

NeurNCD: Novel Class Discovery via Implicit Neural Representation
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image-based rendering

Recommendations

Class-Incremental Novel Class Discovery
Computer Vision – ECCV 2022
Abstract
We study the new task of class-incremental Novel Class Discovery (class-iNCD), which refers to the problem of discovering novel categories in an unlabelled data set by leveraging a pre-trained model that has been trained on a labelled data set ...
Self-cooperation Knowledge Distillation for Novel Class Discovery
Computer Vision – ECCV 2024
Abstract
Novel Class Discovery (NCD) aims to discover unknown and novel classes in an unlabeled set by leveraging knowledge already learned about known classes. Existing works focus on instance-level or class-level knowledge representation and build a ...
Large-Scale Pre-trained Models are Surprisingly Strong in Incremental Novel Class Discovery
Pattern Recognition
Abstract
Discovering novel concepts in unlabelled datasets and in a continuous manner is an important desideratum of lifelong learners. In the literature such problems have been partially addressed under very restricted settings, where novel classes are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
77
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)15

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten