Gradual adaption with memory mechanism for image-based 3D model retrieval

doi:10.1016/j.imavis.2022.104482

Image and Vision Computing

Volume 123, July 2022, 104482

https://doi.org/10.1016/j.imavis.2022.104482 Get rights and content

Highlights

•
An end-to-end unsupervised 2D image-based 3D model retrieval framework.
•
Transfering knowledge from labeled 2D images to unlabeled 3D models.
•
Domain-invariant features are disentangled from the original features.
•
Memory module enhances domain-invariant features by representative features.
•
Experiments on MI3DOR and MI3DOR-2 verified the superiority of the method.

Abstract

With the development of 3D modeling technology and its wide application in different fields, the number of 3D models increases rapidly, making 3D model retrieval a hot topic in current research. Compared with other 3D model retrieval methods, 2D image-based unsupervised 3D model retrieval takes the 2D images which have rich labels and are easy to obtain as the queries, and also takes into account the difficulties of labeling 3D models. 2D image-based unsupervised 3D model retrieval is a retrieval task involving cross-domain adaptation problem, which main challenge is the excessive domain gap. In this paper, we propose a cross-domain 3D model retrieval method of memory mechanism based on disentangled feature learning. The disentangled feature learning enables to disentangle the twisted original features into the isolated domain-invariant features and domain-specific features, where the former is to be aligned to narrow the domain gap. On this basis, the memory mechanism selects feature vectors from class memory modules constructed by class representative features of the opposite domain for every sample, which are used to update the domain-invariant features with gradient weight. The memory mechanism can gradually improve the adaptability of the model to the very different two domains. Experiments are conducted on the public datasets MI3DOR and MI3DOR-2 to verify the feasibility and the superiority of the proposed method. Especially on MI3DOR-2 dataset, our method outperforms the current state-of-the-art methods with gains of 7.71% for the strictest retrieval metric NN.

Introduction

Relying on the development of computer software and hardware, many disciplines and industries have been improved, and the types of information that can be obtained, presented and used are becoming richer and richer [1,2]. Especially, 3D technology has gradually attracted attention from the new to mature. 3D modeling technology builds the spatial structure information of the object, which makes the object present in the form closer to the recognition of human. At present, 3D modeling technology has been applied to many industries and fields, such as manufacturing, construction, medical and cultural industries. The management of 3D models is an important part of 3D technology.

In the whole process of 3D model management, 3D model retrieval plays a vital role. Although there are more and more software and hardware that try to simplify the modeling process and reduce its technical difficulty, 3D modeling still requires expertise and high manual costs. This means in many scenarios, it is more convenient and efficient to retrieve the existing 3D models directly than to build a model from scratch. However, with the widespread application of 3D modeling technology, the number of 3D models is also increasing. In particular, in recent years, 3D modeling technology has been successfully deployed in mobile terminals, which has freed itself from hardware limitations and gradually gained a wider user group, resulting in a surge in the number of 3D models, and this growth trend will continue in the future. Under this premise, how to find the needed 3D model in the mass of data has become a practical and challenging task, making the research of 3D model retrieval method attract the attention of scholars.

Up to now, a lot of related work has emerged in 3D model retrieval [3,4]. The current mainstream is to apply deep learning in methods, which brings impressive performance [[5], [6], [7]]. It is worth noting that the success of deep learning-based methods depend on the annotation information of 3D models. However, the annotation of 3D models requires high manual cost. Given that the number of 3D models is huge and still increasing rapidly, the consumption of resources needed to annotate 3D models may be unacceptable. Therefore, for the research of 3D model retrieval methods, seeking a method independent of annotation is one of the directions worth exploring. One of the methods is to transfer knowledge from data in other domains with abundant annotation. The domain adaptation method can ensure the knowledge gained from one label-rich domain can be well applied to different but related domains. Images can be used as source domain because there are many widely used image datasets with reliable annotations. At the same time, compared to the sketch, the real-world images contain more details, which can bring higher retrieval accuracy. Therefore, domain adaptation can be performed between labeled 2D image domain and unlabeled 3D model domain, and the 3D model can be retrieved using 2D images, which leads to the task called unsupervised 3D model retrieval based on 2D images.

Especially, 2D image-based 3D object retrieval is challenging due to the significant gap in the feature space between real 2D images and 3D objects. The images are generally from the real world objects and scenes shot by the camera, while the 3D models are generally made by the computer. The huge visual difference between the two domains reflects the diverse data distribution. Previous cross-domain 3D model retrieval methods tend to do the alignment globally and fixedly to reduce the gap between the two domains, which cannot cope with the large domain gap in 2D image-based 3D model retrieval task because the domain-invariant features and domain-specific features are entangled with each other. And the domain-specific features will interfere with feature alignment and lead to negative transfer effects during domain adaptation. In this paper, a memory mechanism was designed based on the framework of disentangled feature learning to enhance the original representations and gradually improve the adaptability of the model to the target domain. Our contributions can be summarized as follows:

1.
We propose an end-to-end framework for 2D image- based unsupervised 3D model retrieval task, which gradually transfers knowledge from labeled 2D im- ages to unlabeled 3D models. The effectiveness is demonstrated feasible by experiments conducted on MI3DOR and MI3DOR-2 datasets.
2.
We design an incremental memory mechanism involving class memory module. The memory mechanism uses class memory module to update features for alignment, which reduces the large domain gap in a progressive way.

Section snippets

3D retrieval

The 3D model retrieval aims to find the matched 3D model in the database according to the queries. The current mainstream methods can be roughly divided into two kinds: model-based 3D model retrieval methods [[8], [9], [10], [11], [12], [13], [14]] and image-based 3D model retrieval methods [3,[15], [16], [17], [18], [19], [20]].

Model-based 3D model retrieval refers to a retrieval method in which the queries are 3D models. The focus of the model-based 3D model retrieval method is to find

Overview

The aim of the 2D image-based 3D model retrieval task is to retrieve 3D models matched to the query 2D images. In this task, 2D images are defined as the source domain D_s = {(x_sⁱ, y_sⁱ)}_i=1^n_s, where x_sⁱ is the i-th image with its corresponding label y_sⁱ ∈ [0, J − 1]. n_s and J are the number of image samples and the number of classes respectively. Unlabeled 3D models are defined as the target domain D_t = {x_tⁱ}_i=1^n_t, which contains n_t unlabeled 3D model samples x_t. The whole framework we proposed is

Dataset

MI3DOR [48] is a public dataset containing 21,000 2D images and 7690 3D models of the same 21 categories, which is divided into a training set and a test set. The training set includes 10,500 2D images and 3842 3D models of all categories. Each 3D model is represented by 12 views. The test set includes the rest 2D images and 3D models.

MI3DOR-2 [1] contains 19,694 2D images and 3982 3D models (also represeted by 12 views) of 40 categories. The training set includes 19,294 2D images and 3182 3D

Conclusion

Aiming at the task of 3D model retrieval based on 2D images, this paper proposes a framework combined with memory mechanism. For the challenge of the large domain gap in this task, the framework is designed with a memory mechanism based on the disentangled feature learning. The extracted original visual features are disentangled to obtain domain-invariant features. The domain-invariant features learned in the beginning of training are still affected by domain divergence and cannot achieve the

CRediT authorship contribution statement

Dan Song: Conceptualization, Methodology. Yuting Ling: Data curation, Software, Writing – original draft. Tianbao Li: Investigation, Writing – review & editing, Software. Ting Zhang: Investigation, Writing – original draft. Guoqing Jin: Writing – review & editing. Junbo Guo: Software, Validation. Xuanya Li: Visualization.

Declaration of Competing Interest

None.

Acknowledgment

This work was supported in part by the National Nature Science Foundation of China (61902277), State Key Laboratory of Communication Content Cognition (Grant No. A02106), the Open Funding Project of the State Key Laboratory of Communication Content Cognition (Grant No. 20K04) and the Baidu Program.

References (56)

Seong-heum Kim et al.
Category-specific upright orientation estimation for 3d model classification and retrieval
Image Vis. Comput.
(2020)
Zongsheng Yue et al.
Semi-supervised learning through adaptive laplacian graph trimming
Image Vis. Comput.
(2017)
Heyu Zhou et al.
Dual-level embedding alignment network for 2d image-based 3d object retrieval
Pengzhen Ren et al.
A comprehensive survey of neural architecture search: challenges and solutions
ACM Comput. Surv.
(2021)
Su Hang et al.
Multi-view convolutional neural networks for 3d shape recognition
Charles R. Qi et al.
Pointnet: Deep learning on point sets for 3d classification and segmentation
Heyu Zhou et al.
Semantic consistency guided instance feature alignment for 2d image-based 3d shape retrieval
Zhihui Li et al.
Dynamic affinity graph construction for spectral clustering using multiple features
IEEE Trans. Neural Netw. Learn. Syst.
(2018)
Xiaojun Chang et al.
Compound rank-k projections for bilinear analysis
IEEE Trans. Neural Netw. Learn. Syst.
(2015)
Pointnet: A 3d convolutional neural network for real-time object class recognition

Charles Ruizhongtai Qi et al.

Pointnet ++: Deep hierarchical feature learning on point sets in a metric space

Daniel Maturana et al.

Voxnet: A 3d convolutional neural network for real-time object recognition

Wu Zhirong et al.

3d shapenets: A deep representation for volumetric shapes

Nima Sedaghat et al.

Orientation-Boosted Voxel Nets for 3d Object Recognition

(2016)

Weizhi Nie et al.

Dan: deep-attention network for 3d shape recognition

IEEE Trans. Image Process.

(2021)

Song Bai et al.

Gift: Towards scalable 3d shape retrieval

IEEE Trans. Multimedia

(2017)

Alexander Grabner et al.

3d pose estimation and 3d model retrieval for objects in the wild

Xinwei He et al.

Triplet-center loss for multi-view 3d object retrieval

Zhaoqun Li et al.

Angular triplet-center loss for multi-view 3d shape retrieval

Jin Xie et al.

Learning barycentric representations of 3d shapes for sketch-based 3d shape retrieval

Wei-Zhi Nie et al.

Deep correlated joint network for 2-d image-based 3-d model retrieval

IEEE Trans. Cybern.

(2020)

Gvcnn: Group-view convolutional neural networks for 3d shape recognition

Fang Wang et al.

Sketch-based 3d shape retrieval using convolutional neural networks

Fan Zhu et al.

Learning cross-domain neural networks for sketch-based 3d shape retrieval

Guoxian Dai et al.

Deep correlated metric learning for sketch-based 3d shape retrieval

Mu Pan-pan et al.

Image-based 3d model retrieval using manifold learning

Front. Inform. Technol. Electron. Eng.

(2018)

Heyu Zhou et al.

Hierarchical instance feature alignment for 2d image-based 3d shape retrieval