Gradual adaption with memory mechanism for image-based 3D model retrieval

https://doi.org/10.1016/j.imavis.2022.104482Get rights and content

Highlights

  • An end-to-end unsupervised 2D image-based 3D model retrieval framework.

  • Transfering knowledge from labeled 2D images to unlabeled 3D models.

  • Domain-invariant features are disentangled from the original features.

  • Memory module enhances domain-invariant features by representative features.

  • Experiments on MI3DOR and MI3DOR-2 verified the superiority of the method.

Abstract

With the development of 3D modeling technology and its wide application in different fields, the number of 3D models increases rapidly, making 3D model retrieval a hot topic in current research. Compared with other 3D model retrieval methods, 2D image-based unsupervised 3D model retrieval takes the 2D images which have rich labels and are easy to obtain as the queries, and also takes into account the difficulties of labeling 3D models. 2D image-based unsupervised 3D model retrieval is a retrieval task involving cross-domain adaptation problem, which main challenge is the excessive domain gap. In this paper, we propose a cross-domain 3D model retrieval method of memory mechanism based on disentangled feature learning. The disentangled feature learning enables to disentangle the twisted original features into the isolated domain-invariant features and domain-specific features, where the former is to be aligned to narrow the domain gap. On this basis, the memory mechanism selects feature vectors from class memory modules constructed by class representative features of the opposite domain for every sample, which are used to update the domain-invariant features with gradient weight. The memory mechanism can gradually improve the adaptability of the model to the very different two domains. Experiments are conducted on the public datasets MI3DOR and MI3DOR-2 to verify the feasibility and the superiority of the proposed method. Especially on MI3DOR-2 dataset, our method outperforms the current state-of-the-art methods with gains of 7.71% for the strictest retrieval metric NN.

Introduction

Relying on the development of computer software and hardware, many disciplines and industries have been improved, and the types of information that can be obtained, presented and used are becoming richer and richer [1,2]. Especially, 3D technology has gradually attracted attention from the new to mature. 3D modeling technology builds the spatial structure information of the object, which makes the object present in the form closer to the recognition of human. At present, 3D modeling technology has been applied to many industries and fields, such as manufacturing, construction, medical and cultural industries. The management of 3D models is an important part of 3D technology.

In the whole process of 3D model management, 3D model retrieval plays a vital role. Although there are more and more software and hardware that try to simplify the modeling process and reduce its technical difficulty, 3D modeling still requires expertise and high manual costs. This means in many scenarios, it is more convenient and efficient to retrieve the existing 3D models directly than to build a model from scratch. However, with the widespread application of 3D modeling technology, the number of 3D models is also increasing. In particular, in recent years, 3D modeling technology has been successfully deployed in mobile terminals, which has freed itself from hardware limitations and gradually gained a wider user group, resulting in a surge in the number of 3D models, and this growth trend will continue in the future. Under this premise, how to find the needed 3D model in the mass of data has become a practical and challenging task, making the research of 3D model retrieval method attract the attention of scholars.

Up to now, a lot of related work has emerged in 3D model retrieval [3,4]. The current mainstream is to apply deep learning in methods, which brings impressive performance [[5], [6], [7]]. It is worth noting that the success of deep learning-based methods depend on the annotation information of 3D models. However, the annotation of 3D models requires high manual cost. Given that the number of 3D models is huge and still increasing rapidly, the consumption of resources needed to annotate 3D models may be unacceptable. Therefore, for the research of 3D model retrieval methods, seeking a method independent of annotation is one of the directions worth exploring. One of the methods is to transfer knowledge from data in other domains with abundant annotation. The domain adaptation method can ensure the knowledge gained from one label-rich domain can be well applied to different but related domains. Images can be used as source domain because there are many widely used image datasets with reliable annotations. At the same time, compared to the sketch, the real-world images contain more details, which can bring higher retrieval accuracy. Therefore, domain adaptation can be performed between labeled 2D image domain and unlabeled 3D model domain, and the 3D model can be retrieved using 2D images, which leads to the task called unsupervised 3D model retrieval based on 2D images.

Especially, 2D image-based 3D object retrieval is challenging due to the significant gap in the feature space between real 2D images and 3D objects. The images are generally from the real world objects and scenes shot by the camera, while the 3D models are generally made by the computer. The huge visual difference between the two domains reflects the diverse data distribution. Previous cross-domain 3D model retrieval methods tend to do the alignment globally and fixedly to reduce the gap between the two domains, which cannot cope with the large domain gap in 2D image-based 3D model retrieval task because the domain-invariant features and domain-specific features are entangled with each other. And the domain-specific features will interfere with feature alignment and lead to negative transfer effects during domain adaptation. In this paper, a memory mechanism was designed based on the framework of disentangled feature learning to enhance the original representations and gradually improve the adaptability of the model to the target domain. Our contributions can be summarized as follows:

  • 1.

    We propose an end-to-end framework for 2D image- based unsupervised 3D model retrieval task, which gradually transfers knowledge from labeled 2D im- ages to unlabeled 3D models. The effectiveness is demonstrated feasible by experiments conducted on MI3DOR and MI3DOR-2 datasets.

  • 2.

    We design an incremental memory mechanism involving class memory module. The memory mechanism uses class memory module to update features for alignment, which reduces the large domain gap in a progressive way.

Section snippets

3D retrieval

The 3D model retrieval aims to find the matched 3D model in the database according to the queries. The current mainstream methods can be roughly divided into two kinds: model-based 3D model retrieval methods [[8], [9], [10], [11], [12], [13], [14]] and image-based 3D model retrieval methods [3,[15], [16], [17], [18], [19], [20]].

Model-based 3D model retrieval refers to a retrieval method in which the queries are 3D models. The focus of the model-based 3D model retrieval method is to find

Overview

The aim of the 2D image-based 3D model retrieval task is to retrieve 3D models matched to the query 2D images. In this task, 2D images are defined as the source domain Ds = {(xsi, ysi)}i=1ns, where xsi is the i-th image with its corresponding label ysi ∈ [0, J − 1]. ns and J are the number of image samples and the number of classes respectively. Unlabeled 3D models are defined as the target domain Dt = {xti}i=1nt, which contains nt unlabeled 3D model samples xt. The whole framework we proposed is

Dataset

MI3DOR [48] is a public dataset containing 21,000 2D images and 7690 3D models of the same 21 categories, which is divided into a training set and a test set. The training set includes 10,500 2D images and 3842 3D models of all categories. Each 3D model is represented by 12 views. The test set includes the rest 2D images and 3D models.

MI3DOR-2 [1] contains 19,694 2D images and 3982 3D models (also represeted by 12 views) of 40 categories. The training set includes 19,294 2D images and 3182 3D

Conclusion

Aiming at the task of 3D model retrieval based on 2D images, this paper proposes a framework combined with memory mechanism. For the challenge of the large domain gap in this task, the framework is designed with a memory mechanism based on the disentangled feature learning. The extracted original visual features are disentangled to obtain domain-invariant features. The domain-invariant features learned in the beginning of training are still affected by domain divergence and cannot achieve the

CRediT authorship contribution statement

Dan Song: Conceptualization, Methodology. Yuting Ling: Data curation, Software, Writing – original draft. Tianbao Li: Investigation, Writing – review & editing, Software. Ting Zhang: Investigation, Writing – original draft. Guoqing Jin: Writing – review & editing. Junbo Guo: Software, Validation. Xuanya Li: Visualization.

Declaration of Competing Interest

None.

Acknowledgment

This work was supported in part by the National Nature Science Foundation of China (61902277), State Key Laboratory of Communication Content Cognition (Grant No. A02106), the Open Funding Project of the State Key Laboratory of Communication Content Cognition (Grant No. 20K04) and the Baidu Program.

References (56)

  • Seong-heum Kim et al.

    Category-specific upright orientation estimation for 3d model classification and retrieval

    Image Vis. Comput.

    (2020)
  • Zongsheng Yue et al.

    Semi-supervised learning through adaptive laplacian graph trimming

    Image Vis. Comput.

    (2017)
  • Heyu Zhou et al.

    Dual-level embedding alignment network for 2d image-based 3d object retrieval

  • Pengzhen Ren et al.

    A comprehensive survey of neural architecture search: challenges and solutions

    ACM Comput. Surv.

    (2021)
  • Su Hang et al.

    Multi-view convolutional neural networks for 3d shape recognition

  • Charles R. Qi et al.

    Pointnet: Deep learning on point sets for 3d classification and segmentation

  • Heyu Zhou et al.

    Semantic consistency guided instance feature alignment for 2d image-based 3d shape retrieval

  • Zhihui Li et al.

    Dynamic affinity graph construction for spectral clustering using multiple features

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • Xiaojun Chang et al.

    Compound rank-k projections for bilinear analysis

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • Pointnet: A 3d convolutional neural network for real-time object class recognition

  • Charles Ruizhongtai Qi et al.

    Pointnet ++: Deep hierarchical feature learning on point sets in a metric space

  • Daniel Maturana et al.

    Voxnet: A 3d convolutional neural network for real-time object recognition

  • Wu Zhirong et al.

    3d shapenets: A deep representation for volumetric shapes

  • Nima Sedaghat et al.

    Orientation-Boosted Voxel Nets for 3d Object Recognition

    (2016)
  • Weizhi Nie et al.

    Dan: deep-attention network for 3d shape recognition

    IEEE Trans. Image Process.

    (2021)
  • Song Bai et al.

    Gift: Towards scalable 3d shape retrieval

    IEEE Trans. Multimedia

    (2017)
  • Alexander Grabner et al.

    3d pose estimation and 3d model retrieval for objects in the wild

  • Xinwei He et al.

    Triplet-center loss for multi-view 3d object retrieval

  • Zhaoqun Li et al.

    Angular triplet-center loss for multi-view 3d shape retrieval

  • Jin Xie et al.

    Learning barycentric representations of 3d shapes for sketch-based 3d shape retrieval

  • Wei-Zhi Nie et al.

    Deep correlated joint network for 2-d image-based 3-d model retrieval

    IEEE Trans. Cybern.

    (2020)
  • Gvcnn: Group-view convolutional neural networks for 3d shape recognition

  • Fang Wang et al.

    Sketch-based 3d shape retrieval using convolutional neural networks

  • Fan Zhu et al.

    Learning cross-domain neural networks for sketch-based 3d shape retrieval

  • Guoxian Dai et al.

    Deep correlated metric learning for sketch-based 3d shape retrieval

  • Mu Pan-pan et al.

    Image-based 3d model retrieval using manifold learning

    Front. Inform. Technol. Electron. Eng.

    (2018)
  • Heyu Zhou et al.

    Hierarchical instance feature alignment for 2d image-based 3d shape retrieval

  • Alexander Grabner et al.

    Location field descriptors: Single image 3d model retrieval in the wild

  • Cited by (7)

    • TextANIMAR: Text-based 3D animal fine-grained retrieval

      2023, Computers and Graphics (Pergamon)
    • Hierarchical deep semantic alignment for cross-domain 3D model retrieval

      2023, Journal of Visual Communication and Image Representation
    • Focus on Hard Samples: Hierarchical Unbiased Constraints for Cross-Domain 3D Model Retrieval

      2023, IEEE Transactions on Circuits and Systems for Video Technology
    View all citing articles on Scopus
    View full text