Abstract
In this article, we propose a novel method to address the two-dimensional (2D) image-based 3D object retrieval problem. First, we extract a set of virtual views to represent each 3D object. Then, a soft-attention model is utilized to find the weight of each view to select one characteristic view for each 3D object. Second, we propose a novel Holistic Generative Adversarial Network (HGAN) to solve the cross-domain feature representation problem and make the feature space of virtual characteristic view more inclined to the feature space of the real picture. This will effectively mitigate the distribution discrepancies across the 2D image domains and 3D object domains. Finally, we utilize the generative model of the HGAN to obtain the “virtual real image” of each 3D object and make the characteristic view of the 3D object and real picture possess the same feature space for retrieval. To demonstrate the performance of our approach, We established a new dataset that includes pairs of 2D images and 3D objects, where the 3D objects are based on the ModelNet40 dataset. The experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.
- Hameed Abdul-Rashid, Juefei Yuan, Bo Li, and Lu et al. 2018. 2D image-based 3D scene retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Alex Telea, Theoharis Theoharis, and Remco Veltkamp (Eds.). The Eurographics Association. DOI:https://doi.org/10.2312/3dor.20181051Google Scholar
- Arasanathan Anjulan and Nishan Canagarajah. 2009. A Unified Framework for Object Retrieval and Mining. IEEE Press. 63--76 pages.Google Scholar
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein GAN. (2017).Google Scholar
- Mathieu Aubry, Daniel Maturana, Alexei A. Efros, Bryan C. Russell, and Josef Sivic. 2014. Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models. In Computer Vision and Pattern Recognition. 3762--3769.Google Scholar
- Mathieu Aubry and Bryan C. Russell. 2015. Understanding deep features with computer-generated imagery. In Proceedings of the IEEE International Conference on Computer Vision. 2875--2883.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Comput. Sci. (2014).Google Scholar
- F. Bosche and C. T. Haas. 2008. Automated retrieval of 3D CAD model objects in construction range images. Autom. Constr. 17, 4 (2008), 499--512.Google ScholarCross Ref
- Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. (2014).Google Scholar
- G. Dai, J. Xie, and Y. Fang. 2018. Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27, 7 (2018), 3374.Google ScholarCross Ref
- Thomas Funkhouser, Patrick Min, Michael Kazhdan, Joyce Chen, Alex Halderman, David Dobkin, and David Jacobs. 2003. A search engine for 3D models. ACM Trans. Graph. 22, 1 (2003), 83--105.Google ScholarDigital Library
- Takahiko Furuya and Ryutarou Ohbuchi. 2014. Hashing cross-modal manifold for scalable sketch-based 3D model retrieval. In Proceedings of the International Conference on 3D Vision. 543--550.Google ScholarDigital Library
- Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision. Springer, 484--499.Google ScholarCross Ref
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems. 2672--2680.Google Scholar
- Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13, Mar (2012), 723--773.Google Scholar
- G. Guetat, M. Maitre, L. Joly, S. L. Lai, Tzumin Lee, and Y. Shinagawa. 2006. Automatic 3-D grayscale volume matching and shape analysis. IEEE Trans. Inf. Technol. Biomed. 10, 2 (2006), 362--376.Google ScholarDigital Library
- Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi Man Vong, Yushen Liu, Matthias Zwicker, Junwei Han, and C. L. Philip Chen. 2019. SeqViews2SeqLabels: Learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28, 2 (2019), 658--672.Google ScholarDigital Library
- Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3D object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945--1954.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.Google ScholarDigital Library
- Bo Li, Afzal Godil, Masaki Aono, X Bai, Takahiko Furuya, L Li, Roberto Javier López-Sastre, Henry Johan, Ryutarou Ohbuchi, Carolina Redondo-Cabrera, et al. 2012. SHREC’12 track: Generic 3D shape retrieval. In Proceedings of the Eurographics Conference on 3D Object Retrieval (3DOR’12), Vol. 6.Google Scholar
- Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Martin Burtscher, Hongbo Fu, Takahiko Furuya, Henry Johan, et al. 2014. SHREC’14 track: Extended large scale sketch-based 3D shape retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Vol. 2014, 121--130.Google Scholar
- Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohenor, and Leonidas J Guibas. 2015. Joint embeddings of shapes and images via CNN image purification. Int. Conf. Comput. Graph. Interact. Techn. 34, 6 (2015), 234.Google ScholarDigital Library
- An An Liu, Wei Zhi Nie, and Yu Ting Su. 2018. 3D object retrieval based on multi-view latent variable model. IEEE Trans. Circ. Syst. Vid. Technol. 29, 3 (2018), 868--880.Google ScholarDigital Library
- Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 2200--2207.Google Scholar
- Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In Computer Vision and Pattern Recognition. 1410--1417.Google Scholar
- Francisco Massa, Bryan C. Russell, and Mathieu Aubry. 2016. Deep exemplar 2D-3D detection by adapting from real to rendered views. In Computer Vision and Pattern Recognition. 6024--6033.Google Scholar
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. Comput. Sci. (2014), 2672--2680.Google Scholar
- Panpan Mu, Sanyuan Zhang, Yin Zhang, Xiuzi Ye, and Xiang Pan. 2018. Image-based 3D model retrieval using manifold learning. J. Zhejiang Univ. Sci. C 19, 11 (2018), 1397--1408.Google ScholarCross Ref
- S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. 2011. Domain adaptation via transfer component analysis. IEEE Trans. Neur. Netw. 22, 2 (2011), 199--210.Google ScholarDigital Library
- Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437--2452.Google ScholarDigital Library
- Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2536--2544.Google Scholar
- Yang Yu, Zhiqiang Gong, Ping Zhong, and Jiaxin Shan. 2017. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In International Conference on Image and Graphics. Springer, 97--108.Google ScholarCross Ref
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.Google ScholarDigital Library
- Tianjia Shao, Weiwei Xu, Kun Zhou, Jingdong Wang, Dongping Li, and Baining Guo. 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. Int. Conf. Comput. Graph. Interact. Techn. 31, 6 (2012), 136.Google ScholarDigital Library
- Kaleem Siddiqi, Juan Zhang, Diego Macrini, Ali Shokoufandeh, Sylvain Bouix, and Sven J. Dickinson. 2008. Retrieving articulated 3-D models using medial surfaces. Mach. Vis. Appl. 19, 4 (2008), 261--275.Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarCross Ref
- Atsushi Tatsuma, Hitoshi Koyanagi, and Masaki Aono. 2012. A large-scale shape benchmark for 3d object retrieval: Toyohashi shape benchmark. In Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 1--10.Google Scholar
- Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. (2016).Google Scholar
- Hau San Wong, Bo Ma, Zhiwen Yu, Pui Fong Yeung, and Horace H. S. Ip. 2007. 3-D head model retrieval using a single face view query. IEEE Trans. Multimedia 9, 5 (2007), 1026--1036.Google ScholarDigital Library
- Botong Wu, Qiang Yang, Wei Shi Zheng, Yizhou Wang, and Jingdong Wang. 2015. Quantized correlation hashing for fast cross-modal search. In Proceedings of the International Conference on Artificial Intelligence. 3946--3952.Google Scholar
- Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems. 82--90.Google Scholar
- Zhirong Wu, S. Song, A. Khosla, and Fisher Yu. 2014. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google Scholar
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057.Google ScholarDigital Library
- Jeng Sheng Yeh, Ding Yun Chen, Bing Yu Chen, and Ouhyoung Ming. 2005. A web-based three-dimensional protein retrieval system by matching visual similarity. Bioinformatics 21, 13 (2005), 3056.Google ScholarDigital Library
- Zhen Yi and Dit Yan Yeung. 2012. Co-regularized hashing for multimodal data. In Proceedings of the International Conference on Neural Information Processing Systems. 1376--1384.Google Scholar
- Z. Lin, G. Ding, J. Han, and J. Wang. 2017. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybernet. 47, 12 (2017), 4342--4355.Google ScholarCross Ref
- Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).Google Scholar
- Jing Zhang, Wanqing Li, and Philip Ogunbona. 2017. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1859--1867.Google ScholarCross Ref
- Fan Zhu, Jin Xie, and Yi Fang. 2016. Learning cross-domain neural networks for sketch-based 3D shape retrieval. In Proceedings of the Association for the Advancement of Artificial Intelligence Conference (AAAI’16).Google ScholarCross Ref
- Jing Zhu, John-Ross Rizzo, and Yi Fang. 2017. Learning domain-invariant feature for robust depth-image-based 3D shape retrieval. Pattern Recognition Letters (2017).Google Scholar
- Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.Google Scholar
Index Terms
- HGAN: Holistic Generative Adversarial Networks for Two-dimensional Image-based Three-dimensional Object Retrieval
Recommendations
Graph-based characteristic view set extraction and matching for 3D model retrieval
In recent times, multi-view representation of the 3D model has led to extensive research in view-based methods for 3D model retrieval. However, most approaches focus on feature extraction from 2D images while ignoring the spatial information of the 3D ...
Multimodal 3D Object Retrieval
MultiMedia ModelingAbstractThree-dimensional (3D) retrieval of objects and models plays a crucial role in many application areas, such as industrial design, medical imaging, gaming and virtual and augmented reality. Such 3D retrieval involves storing and retrieving ...
Cycle-object consistency for image-to-image domain adaptation
Highlights- In this paper, for the first time, we introduce an instance-aware GAN framework, AugGAN-Det, to jointly train a generator with an object detector (for image-object style) and a discriminator (for global style).
- As to the previous ...
AbstractRecent advances in generative adversarial networks (GANs) have been proven effective in performing domain adaptation for object detectors through data augmentation. While GANs are exceptionally successful, those methods that can preserve objects ...
Comments