skip to main content
research-article

HGAN: Holistic Generative Adversarial Networks for Two-dimensional Image-based Three-dimensional Object Retrieval

Published: 16 December 2019 Publication History

Abstract

In this article, we propose a novel method to address the two-dimensional (2D) image-based 3D object retrieval problem. First, we extract a set of virtual views to represent each 3D object. Then, a soft-attention model is utilized to find the weight of each view to select one characteristic view for each 3D object. Second, we propose a novel Holistic Generative Adversarial Network (HGAN) to solve the cross-domain feature representation problem and make the feature space of virtual characteristic view more inclined to the feature space of the real picture. This will effectively mitigate the distribution discrepancies across the 2D image domains and 3D object domains. Finally, we utilize the generative model of the HGAN to obtain the “virtual real image” of each 3D object and make the characteristic view of the 3D object and real picture possess the same feature space for retrieval. To demonstrate the performance of our approach, We established a new dataset that includes pairs of 2D images and 3D objects, where the 3D objects are based on the ModelNet40 dataset. The experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

References

[1]
Hameed Abdul-Rashid, Juefei Yuan, Bo Li, and Lu et al. 2018. 2D image-based 3D scene retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Alex Telea, Theoharis Theoharis, and Remco Veltkamp (Eds.). The Eurographics Association.
[2]
Arasanathan Anjulan and Nishan Canagarajah. 2009. A Unified Framework for Object Retrieval and Mining. IEEE Press. 63--76 pages.
[3]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein GAN. (2017).
[4]
Mathieu Aubry, Daniel Maturana, Alexei A. Efros, Bryan C. Russell, and Josef Sivic. 2014. Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models. In Computer Vision and Pattern Recognition. 3762--3769.
[5]
Mathieu Aubry and Bryan C. Russell. 2015. Understanding deep features with computer-generated imagery. In Proceedings of the IEEE International Conference on Computer Vision. 2875--2883.
[6]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Comput. Sci. (2014).
[7]
F. Bosche and C. T. Haas. 2008. Automated retrieval of 3D CAD model objects in construction range images. Autom. Constr. 17, 4 (2008), 499--512.
[8]
Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. (2014).
[9]
G. Dai, J. Xie, and Y. Fang. 2018. Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27, 7 (2018), 3374.
[10]
Thomas Funkhouser, Patrick Min, Michael Kazhdan, Joyce Chen, Alex Halderman, David Dobkin, and David Jacobs. 2003. A search engine for 3D models. ACM Trans. Graph. 22, 1 (2003), 83--105.
[11]
Takahiko Furuya and Ryutarou Ohbuchi. 2014. Hashing cross-modal manifold for scalable sketch-based 3D model retrieval. In Proceedings of the International Conference on 3D Vision. 543--550.
[12]
Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision. Springer, 484--499.
[13]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems. 2672--2680.
[14]
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13, Mar (2012), 723--773.
[15]
G. Guetat, M. Maitre, L. Joly, S. L. Lai, Tzumin Lee, and Y. Shinagawa. 2006. Automatic 3-D grayscale volume matching and shape analysis. IEEE Trans. Inf. Technol. Biomed. 10, 2 (2006), 362--376.
[16]
Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi Man Vong, Yushen Liu, Matthias Zwicker, Junwei Han, and C. L. Philip Chen. 2019. SeqViews2SeqLabels: Learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28, 2 (2019), 658--672.
[17]
Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3D object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945--1954.
[18]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.
[19]
Bo Li, Afzal Godil, Masaki Aono, X Bai, Takahiko Furuya, L Li, Roberto Javier López-Sastre, Henry Johan, Ryutarou Ohbuchi, Carolina Redondo-Cabrera, et al. 2012. SHREC’12 track: Generic 3D shape retrieval. In Proceedings of the Eurographics Conference on 3D Object Retrieval (3DOR’12), Vol. 6.
[20]
Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Martin Burtscher, Hongbo Fu, Takahiko Furuya, Henry Johan, et al. 2014. SHREC’14 track: Extended large scale sketch-based 3D shape retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Vol. 2014, 121--130.
[21]
Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohenor, and Leonidas J Guibas. 2015. Joint embeddings of shapes and images via CNN image purification. Int. Conf. Comput. Graph. Interact. Techn. 34, 6 (2015), 234.
[22]
An An Liu, Wei Zhi Nie, and Yu Ting Su. 2018. 3D object retrieval based on multi-view latent variable model. IEEE Trans. Circ. Syst. Vid. Technol. 29, 3 (2018), 868--880.
[23]
Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 2200--2207.
[24]
Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In Computer Vision and Pattern Recognition. 1410--1417.
[25]
Francisco Massa, Bryan C. Russell, and Mathieu Aubry. 2016. Deep exemplar 2D-3D detection by adapting from real to rendered views. In Computer Vision and Pattern Recognition. 6024--6033.
[26]
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. Comput. Sci. (2014), 2672--2680.
[27]
Panpan Mu, Sanyuan Zhang, Yin Zhang, Xiuzi Ye, and Xiang Pan. 2018. Image-based 3D model retrieval using manifold learning. J. Zhejiang Univ. Sci. C 19, 11 (2018), 1397--1408.
[28]
S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. 2011. Domain adaptation via transfer component analysis. IEEE Trans. Neur. Netw. 22, 2 (2011), 199--210.
[29]
Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437--2452.
[30]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2536--2544.
[31]
Yang Yu, Zhiqiang Gong, Ping Zhong, and Jiaxin Shan. 2017. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In International Conference on Image and Graphics. Springer, 97--108.
[32]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.
[33]
Tianjia Shao, Weiwei Xu, Kun Zhou, Jingdong Wang, Dongping Li, and Baining Guo. 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. Int. Conf. Comput. Graph. Interact. Techn. 31, 6 (2012), 136.
[34]
Kaleem Siddiqi, Juan Zhang, Diego Macrini, Ali Shokoufandeh, Sylvain Bouix, and Sven J. Dickinson. 2008. Retrieving articulated 3-D models using medial surfaces. Mach. Vis. Appl. 19, 4 (2008), 261--275.
[35]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[36]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.
[37]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.
[38]
Atsushi Tatsuma, Hitoshi Koyanagi, and Masaki Aono. 2012. A large-scale shape benchmark for 3d object retrieval: Toyohashi shape benchmark. In Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 1--10.
[39]
Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. (2016).
[40]
Hau San Wong, Bo Ma, Zhiwen Yu, Pui Fong Yeung, and Horace H. S. Ip. 2007. 3-D head model retrieval using a single face view query. IEEE Trans. Multimedia 9, 5 (2007), 1026--1036.
[41]
Botong Wu, Qiang Yang, Wei Shi Zheng, Yizhou Wang, and Jingdong Wang. 2015. Quantized correlation hashing for fast cross-modal search. In Proceedings of the International Conference on Artificial Intelligence. 3946--3952.
[42]
Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems. 82--90.
[43]
Zhirong Wu, S. Song, A. Khosla, and Fisher Yu. 2014. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.
[44]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057.
[45]
Jeng Sheng Yeh, Ding Yun Chen, Bing Yu Chen, and Ouhyoung Ming. 2005. A web-based three-dimensional protein retrieval system by matching visual similarity. Bioinformatics 21, 13 (2005), 3056.
[46]
Zhen Yi and Dit Yan Yeung. 2012. Co-regularized hashing for multimodal data. In Proceedings of the International Conference on Neural Information Processing Systems. 1376--1384.
[47]
Z. Lin, G. Ding, J. Han, and J. Wang. 2017. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybernet. 47, 12 (2017), 4342--4355.
[48]
Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).
[49]
Jing Zhang, Wanqing Li, and Philip Ogunbona. 2017. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1859--1867.
[50]
Fan Zhu, Jin Xie, and Yi Fang. 2016. Learning cross-domain neural networks for sketch-based 3D shape retrieval. In Proceedings of the Association for the Advancement of Artificial Intelligence Conference (AAAI’16).
[51]
Jing Zhu, John-Ross Rizzo, and Yi Fang. 2017. Learning domain-invariant feature for robust depth-image-based 3D shape retrieval. Pattern Recognition Letters (2017).
[52]
Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.

Cited By

View all
  • (2025)T2TD: Text-3D Generation Model Based on Prior Knowledge GuidanceIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346375347:1(172-189)Online publication date: 1-Jan-2025
  • (2025)Point‐PC: Point cloud completion guided by prior knowledge via causal inferenceCAAI Transactions on Intelligence Technology10.1049/cit2.12379Online publication date: 6-Jan-2025
  • (2024)Feature Skeletons-Based Model Retrieval for Bolus Shaping in Cancer Care2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612542(1-6)Online publication date: 25-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 4
November 2019
322 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3376119
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2019
Accepted: 01 July 2019
Revised: 01 April 2019
Received: 01 November 2018
Published in TOMM Volume 15, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D retrieval
  2. characteristic view
  3. generative adversarial networks
  4. image-based

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)T2TD: Text-3D Generation Model Based on Prior Knowledge GuidanceIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346375347:1(172-189)Online publication date: 1-Jan-2025
  • (2025)Point‐PC: Point cloud completion guided by prior knowledge via causal inferenceCAAI Transactions on Intelligence Technology10.1049/cit2.12379Online publication date: 6-Jan-2025
  • (2024)Feature Skeletons-Based Model Retrieval for Bolus Shaping in Cancer Care2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612542(1-6)Online publication date: 25-Jun-2024
  • (2024)Cross-Modal Contrastive Learning with a Style-Mixed Bridge for Single Image 3D Shape RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368964520:12(1-24)Online publication date: 30-Aug-2024
  • (2024)Revolutionizing Visuals: The Role of Generative AI in Modern Image GenerationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368964120:11(1-22)Online publication date: 22-Aug-2024
  • (2024) Filling the Holes on 3D Heritage Object Surface Based on Automatic Segmentation Algorithm Expert Systems10.1111/exsy.13749Online publication date: 14-Oct-2024
  • (2024)Towards automating stocktaking in warehousesProcedia Computer Science10.1016/j.procs.2024.01.142232:C(1437-1445)Online publication date: 1-Jan-2024
  • (2023)A Social Recommendation Model Based on Basic Spatial Mapping and Bilateral Generative Adversarial NetworksEntropy10.3390/e2510138825:10(1388)Online publication date: 28-Sep-2023
  • (2023)Self-supervised Image-based 3D Model RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354869019:2(1-18)Online publication date: 23-Mar-2023
  • (2023)Cross-Domain Image-Object Retrieval Based on Weighted Optimal TransportIEEE Transactions on Multimedia10.1109/TMM.2023.325488925(9557-9571)Online publication date: 2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media