Gradual adaption with memory mechanism for image-based 3D model retrieval
Introduction
Relying on the development of computer software and hardware, many disciplines and industries have been improved, and the types of information that can be obtained, presented and used are becoming richer and richer [1,2]. Especially, 3D technology has gradually attracted attention from the new to mature. 3D modeling technology builds the spatial structure information of the object, which makes the object present in the form closer to the recognition of human. At present, 3D modeling technology has been applied to many industries and fields, such as manufacturing, construction, medical and cultural industries. The management of 3D models is an important part of 3D technology.
In the whole process of 3D model management, 3D model retrieval plays a vital role. Although there are more and more software and hardware that try to simplify the modeling process and reduce its technical difficulty, 3D modeling still requires expertise and high manual costs. This means in many scenarios, it is more convenient and efficient to retrieve the existing 3D models directly than to build a model from scratch. However, with the widespread application of 3D modeling technology, the number of 3D models is also increasing. In particular, in recent years, 3D modeling technology has been successfully deployed in mobile terminals, which has freed itself from hardware limitations and gradually gained a wider user group, resulting in a surge in the number of 3D models, and this growth trend will continue in the future. Under this premise, how to find the needed 3D model in the mass of data has become a practical and challenging task, making the research of 3D model retrieval method attract the attention of scholars.
Up to now, a lot of related work has emerged in 3D model retrieval [3,4]. The current mainstream is to apply deep learning in methods, which brings impressive performance [[5], [6], [7]]. It is worth noting that the success of deep learning-based methods depend on the annotation information of 3D models. However, the annotation of 3D models requires high manual cost. Given that the number of 3D models is huge and still increasing rapidly, the consumption of resources needed to annotate 3D models may be unacceptable. Therefore, for the research of 3D model retrieval methods, seeking a method independent of annotation is one of the directions worth exploring. One of the methods is to transfer knowledge from data in other domains with abundant annotation. The domain adaptation method can ensure the knowledge gained from one label-rich domain can be well applied to different but related domains. Images can be used as source domain because there are many widely used image datasets with reliable annotations. At the same time, compared to the sketch, the real-world images contain more details, which can bring higher retrieval accuracy. Therefore, domain adaptation can be performed between labeled 2D image domain and unlabeled 3D model domain, and the 3D model can be retrieved using 2D images, which leads to the task called unsupervised 3D model retrieval based on 2D images.
Especially, 2D image-based 3D object retrieval is challenging due to the significant gap in the feature space between real 2D images and 3D objects. The images are generally from the real world objects and scenes shot by the camera, while the 3D models are generally made by the computer. The huge visual difference between the two domains reflects the diverse data distribution. Previous cross-domain 3D model retrieval methods tend to do the alignment globally and fixedly to reduce the gap between the two domains, which cannot cope with the large domain gap in 2D image-based 3D model retrieval task because the domain-invariant features and domain-specific features are entangled with each other. And the domain-specific features will interfere with feature alignment and lead to negative transfer effects during domain adaptation. In this paper, a memory mechanism was designed based on the framework of disentangled feature learning to enhance the original representations and gradually improve the adaptability of the model to the target domain. Our contributions can be summarized as follows:
- 1.
We propose an end-to-end framework for 2D image- based unsupervised 3D model retrieval task, which gradually transfers knowledge from labeled 2D im- ages to unlabeled 3D models. The effectiveness is demonstrated feasible by experiments conducted on MI3DOR and MI3DOR-2 datasets.
- 2.
We design an incremental memory mechanism involving class memory module. The memory mechanism uses class memory module to update features for alignment, which reduces the large domain gap in a progressive way.
Section snippets
3D retrieval
The 3D model retrieval aims to find the matched 3D model in the database according to the queries. The current mainstream methods can be roughly divided into two kinds: model-based 3D model retrieval methods [[8], [9], [10], [11], [12], [13], [14]] and image-based 3D model retrieval methods [3,[15], [16], [17], [18], [19], [20]].
Model-based 3D model retrieval refers to a retrieval method in which the queries are 3D models. The focus of the model-based 3D model retrieval method is to find
Overview
The aim of the 2D image-based 3D model retrieval task is to retrieve 3D models matched to the query 2D images. In this task, 2D images are defined as the source domain Ds = {(xsi, ysi)}i=1ns, where xsi is the i-th image with its corresponding label ysi ∈ [0, J − 1]. ns and J are the number of image samples and the number of classes respectively. Unlabeled 3D models are defined as the target domain Dt = {xti}i=1nt, which contains nt unlabeled 3D model samples xt. The whole framework we proposed is
Dataset
MI3DOR [48] is a public dataset containing 21,000 2D images and 7690 3D models of the same 21 categories, which is divided into a training set and a test set. The training set includes 10,500 2D images and 3842 3D models of all categories. Each 3D model is represented by 12 views. The test set includes the rest 2D images and 3D models.
MI3DOR-2 [1] contains 19,694 2D images and 3982 3D models (also represeted by 12 views) of 40 categories. The training set includes 19,294 2D images and 3182 3D
Conclusion
Aiming at the task of 3D model retrieval based on 2D images, this paper proposes a framework combined with memory mechanism. For the challenge of the large domain gap in this task, the framework is designed with a memory mechanism based on the disentangled feature learning. The extracted original visual features are disentangled to obtain domain-invariant features. The domain-invariant features learned in the beginning of training are still affected by domain divergence and cannot achieve the
CRediT authorship contribution statement
Dan Song: Conceptualization, Methodology. Yuting Ling: Data curation, Software, Writing – original draft. Tianbao Li: Investigation, Writing – review & editing, Software. Ting Zhang: Investigation, Writing – original draft. Guoqing Jin: Writing – review & editing. Junbo Guo: Software, Validation. Xuanya Li: Visualization.
Declaration of Competing Interest
None.
Acknowledgment
This work was supported in part by the National Nature Science Foundation of China (61902277), State Key Laboratory of Communication Content Cognition (Grant No. A02106), the Open Funding Project of the State Key Laboratory of Communication Content Cognition (Grant No. 20K04) and the Baidu Program.
References (56)
- et al.
Category-specific upright orientation estimation for 3d model classification and retrieval
Image Vis. Comput.
(2020) - et al.
Semi-supervised learning through adaptive laplacian graph trimming
Image Vis. Comput.
(2017) - et al.
Dual-level embedding alignment network for 2d image-based 3d object retrieval
- et al.
A comprehensive survey of neural architecture search: challenges and solutions
ACM Comput. Surv.
(2021) - et al.
Multi-view convolutional neural networks for 3d shape recognition
- et al.
Pointnet: Deep learning on point sets for 3d classification and segmentation
- et al.
Semantic consistency guided instance feature alignment for 2d image-based 3d shape retrieval
- et al.
Dynamic affinity graph construction for spectral clustering using multiple features
IEEE Trans. Neural Netw. Learn. Syst.
(2018) - et al.
Compound rank-k projections for bilinear analysis
IEEE Trans. Neural Netw. Learn. Syst.
(2015) Pointnet: A 3d convolutional neural network for real-time object class recognition
Pointnet ++: Deep hierarchical feature learning on point sets in a metric space
Voxnet: A 3d convolutional neural network for real-time object recognition
3d shapenets: A deep representation for volumetric shapes
Orientation-Boosted Voxel Nets for 3d Object Recognition
Dan: deep-attention network for 3d shape recognition
IEEE Trans. Image Process.
Gift: Towards scalable 3d shape retrieval
IEEE Trans. Multimedia
3d pose estimation and 3d model retrieval for objects in the wild
Triplet-center loss for multi-view 3d object retrieval
Angular triplet-center loss for multi-view 3d shape retrieval
Learning barycentric representations of 3d shapes for sketch-based 3d shape retrieval
Deep correlated joint network for 2-d image-based 3-d model retrieval
IEEE Trans. Cybern.
Gvcnn: Group-view convolutional neural networks for 3d shape recognition
Sketch-based 3d shape retrieval using convolutional neural networks
Learning cross-domain neural networks for sketch-based 3d shape retrieval
Deep correlated metric learning for sketch-based 3d shape retrieval
Image-based 3d model retrieval using manifold learning
Front. Inform. Technol. Electron. Eng.
Hierarchical instance feature alignment for 2d image-based 3d shape retrieval
Location field descriptors: Single image 3d model retrieval in the wild
Cited by (7)
TextANIMAR: Text-based 3D animal fine-grained retrieval
2023, Computers and Graphics (Pergamon)Hierarchical deep semantic alignment for cross-domain 3D model retrieval
2023, Journal of Visual Communication and Image RepresentationPAGML: Precise Alignment Guided Metric Learning for sketch-based 3D shape retrieval
2023, Image and Vision ComputingFocus on Hard Samples: Hierarchical Unbiased Constraints for Cross-Domain 3D Model Retrieval
2023, IEEE Transactions on Circuits and Systems for Video Technology