research-article

HGAN: Holistic Generative Adversarial Networks for Two-dimensional Image-based Three-dimensional Object Retrieval

Authors:

Yuting SuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 4

Article No.: 101, Pages 1 - 24

https://doi.org/10.1145/3344684

Published: 16 December 2019 Publication History

Abstract

In this article, we propose a novel method to address the two-dimensional (2D) image-based 3D object retrieval problem. First, we extract a set of virtual views to represent each 3D object. Then, a soft-attention model is utilized to find the weight of each view to select one characteristic view for each 3D object. Second, we propose a novel Holistic Generative Adversarial Network (HGAN) to solve the cross-domain feature representation problem and make the feature space of virtual characteristic view more inclined to the feature space of the real picture. This will effectively mitigate the distribution discrepancies across the 2D image domains and 3D object domains. Finally, we utilize the generative model of the HGAN to obtain the “virtual real image” of each 3D object and make the characteristic view of the 3D object and real picture possess the same feature space for retrieval. To demonstrate the performance of our approach, We established a new dataset that includes pairs of 2D images and 3D objects, where the 3D objects are based on the ModelNet40 dataset. The experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

References

[1]

Hameed Abdul-Rashid, Juefei Yuan, Bo Li, and Lu et al. 2018. 2D image-based 3D scene retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Alex Telea, Theoharis Theoharis, and Remco Veltkamp (Eds.). The Eurographics Association.

[2]

Arasanathan Anjulan and Nishan Canagarajah. 2009. A Unified Framework for Object Retrieval and Mining. IEEE Press. 63--76 pages.

[3]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein GAN. (2017).

[4]

Mathieu Aubry, Daniel Maturana, Alexei A. Efros, Bryan C. Russell, and Josef Sivic. 2014. Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models. In Computer Vision and Pattern Recognition. 3762--3769.

[5]

Mathieu Aubry and Bryan C. Russell. 2015. Understanding deep features with computer-generated imagery. In Proceedings of the IEEE International Conference on Computer Vision. 2875--2883.

[6]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Comput. Sci. (2014).

[7]

F. Bosche and C. T. Haas. 2008. Automated retrieval of 3D CAD model objects in construction range images. Autom. Constr. 17, 4 (2008), 499--512.

[8]

Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. (2014).

[9]

G. Dai, J. Xie, and Y. Fang. 2018. Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27, 7 (2018), 3374.

[10]

Thomas Funkhouser, Patrick Min, Michael Kazhdan, Joyce Chen, Alex Halderman, David Dobkin, and David Jacobs. 2003. A search engine for 3D models. ACM Trans. Graph. 22, 1 (2003), 83--105.

Digital Library

[11]

Takahiko Furuya and Ryutarou Ohbuchi. 2014. Hashing cross-modal manifold for scalable sketch-based 3D model retrieval. In Proceedings of the International Conference on 3D Vision. 543--550.

Digital Library

[12]

Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision. Springer, 484--499.

[13]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems. 2672--2680.

[14]

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13, Mar (2012), 723--773.

[15]

G. Guetat, M. Maitre, L. Joly, S. L. Lai, Tzumin Lee, and Y. Shinagawa. 2006. Automatic 3-D grayscale volume matching and shape analysis. IEEE Trans. Inf. Technol. Biomed. 10, 2 (2006), 362--376.

Digital Library

[16]

Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi Man Vong, Yushen Liu, Matthias Zwicker, Junwei Han, and C. L. Philip Chen. 2019. SeqViews2SeqLabels: Learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28, 2 (2019), 658--672.

Digital Library

[17]

Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3D object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945--1954.

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.

Digital Library

[19]

Bo Li, Afzal Godil, Masaki Aono, X Bai, Takahiko Furuya, L Li, Roberto Javier López-Sastre, Henry Johan, Ryutarou Ohbuchi, Carolina Redondo-Cabrera, et al. 2012. SHREC’12 track: Generic 3D shape retrieval. In Proceedings of the Eurographics Conference on 3D Object Retrieval (3DOR’12), Vol. 6.

[20]

Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Martin Burtscher, Hongbo Fu, Takahiko Furuya, Henry Johan, et al. 2014. SHREC’14 track: Extended large scale sketch-based 3D shape retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Vol. 2014, 121--130.

[21]

Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohenor, and Leonidas J Guibas. 2015. Joint embeddings of shapes and images via CNN image purification. Int. Conf. Comput. Graph. Interact. Techn. 34, 6 (2015), 234.

Digital Library

[22]

An An Liu, Wei Zhi Nie, and Yu Ting Su. 2018. 3D object retrieval based on multi-view latent variable model. IEEE Trans. Circ. Syst. Vid. Technol. 29, 3 (2018), 868--880.

Digital Library

[23]

Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 2200--2207.

[24]

Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In Computer Vision and Pattern Recognition. 1410--1417.

[25]

Francisco Massa, Bryan C. Russell, and Mathieu Aubry. 2016. Deep exemplar 2D-3D detection by adapting from real to rendered views. In Computer Vision and Pattern Recognition. 6024--6033.

[26]

Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. Comput. Sci. (2014), 2672--2680.

[27]

Panpan Mu, Sanyuan Zhang, Yin Zhang, Xiuzi Ye, and Xiang Pan. 2018. Image-based 3D model retrieval using manifold learning. J. Zhejiang Univ. Sci. C 19, 11 (2018), 1397--1408.

[28]

S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. 2011. Domain adaptation via transfer component analysis. IEEE Trans. Neur. Netw. 22, 2 (2011), 199--210.

Digital Library

[29]

Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437--2452.

Digital Library

[30]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2536--2544.

[31]

Yang Yu, Zhiqiang Gong, Ping Zhong, and Jiaxin Shan. 2017. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In International Conference on Image and Graphics. Springer, 97--108.

[32]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.

Digital Library

[33]

Tianjia Shao, Weiwei Xu, Kun Zhou, Jingdong Wang, Dongping Li, and Baining Guo. 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. Int. Conf. Comput. Graph. Interact. Techn. 31, 6 (2012), 136.

Digital Library

[34]

Kaleem Siddiqi, Juan Zhang, Diego Macrini, Ali Shokoufandeh, Sylvain Bouix, and Sven J. Dickinson. 2008. Retrieving articulated 3-D models using medial surfaces. Mach. Vis. Appl. 19, 4 (2008), 261--275.

Digital Library

[35]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[36]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.

Digital Library

[37]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.

[38]

Atsushi Tatsuma, Hitoshi Koyanagi, and Masaki Aono. 2012. A large-scale shape benchmark for 3d object retrieval: Toyohashi shape benchmark. In Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 1--10.

[39]

Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. (2016).

[40]

Hau San Wong, Bo Ma, Zhiwen Yu, Pui Fong Yeung, and Horace H. S. Ip. 2007. 3-D head model retrieval using a single face view query. IEEE Trans. Multimedia 9, 5 (2007), 1026--1036.

Digital Library

[41]

Botong Wu, Qiang Yang, Wei Shi Zheng, Yizhou Wang, and Jingdong Wang. 2015. Quantized correlation hashing for fast cross-modal search. In Proceedings of the International Conference on Artificial Intelligence. 3946--3952.

[42]

Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems. 82--90.

[43]

Zhirong Wu, S. Song, A. Khosla, and Fisher Yu. 2014. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.

[44]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057.

Digital Library

[45]

Jeng Sheng Yeh, Ding Yun Chen, Bing Yu Chen, and Ouhyoung Ming. 2005. A web-based three-dimensional protein retrieval system by matching visual similarity. Bioinformatics 21, 13 (2005), 3056.

Digital Library

[46]

Zhen Yi and Dit Yan Yeung. 2012. Co-regularized hashing for multimodal data. In Proceedings of the International Conference on Neural Information Processing Systems. 1376--1384.

[47]

Z. Lin, G. Ding, J. Han, and J. Wang. 2017. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybernet. 47, 12 (2017), 4342--4355.

[48]

Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).

[49]

Jing Zhang, Wanqing Li, and Philip Ogunbona. 2017. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1859--1867.

[50]

Fan Zhu, Jin Xie, and Yi Fang. 2016. Learning cross-domain neural networks for sketch-based 3D shape retrieval. In Proceedings of the Association for the Advancement of Artificial Intelligence Conference (AAAI’16).

[51]

Jing Zhu, John-Ross Rizzo, and Yi Fang. 2017. Learning domain-invariant feature for robust depth-image-based 3D shape retrieval. Pattern Recognition Letters (2017).

[52]

Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.

Cited By

Nie WChen RWang WLepri BSebe N(2025)T2TD: Text-3D Generation Model Based on Prior Knowledge GuidanceIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346375347:1(172-189)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TPAMI.2024.3463753
Gao XJiao CChen RWang WNie W(2025)Point‐PC: Point cloud completion guided by prior knowledge via causal inferenceCAAI Transactions on Intelligence Technology10.1049/cit2.12379Online publication date: 6-Jan-2025
https://doi.org/10.1049/cit2.12379
Rui LPeng Q(2024)Feature Skeletons-Based Model Retrieval for Bolus Shaping in Cancer Care2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612542(1-6)Online publication date: 25-Jun-2024
https://doi.org/10.23919/SpliTech61897.2024.10612542
Show More Cited By

Index Terms

HGAN: Holistic Generative Adversarial Networks for Two-dimensional Image-based Three-dimensional Object Retrieval
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Graph-based characteristic view set extraction and matching for 3D model retrieval

In recent times, multi-view representation of the 3D model has led to extensive research in view-based methods for 3D model retrieval. However, most approaches focus on feature extraction from 2D images while ignoring the spatial information of the 3D ...
Multimodal 3D Object Retrieval
MultiMedia Modeling
Abstract
Three-dimensional (3D) retrieval of objects and models plays a crucial role in many application areas, such as industrial design, medical imaging, gaming and virtual and augmented reality. Such 3D retrieval involves storing and retrieving ...
Cycle-object consistency for image-to-image domain adaptation
Highlights
- In this paper, for the first time, we introduce an instance-aware GAN framework, AugGAN-Det, to jointly train a generator with an object detector (for image-object style) and a discriminator (for global style).
- As to the previous ...
Abstract
Recent advances in generative adversarial networks (GANs) have been proven effective in performing domain adaptation for object detectors through data augmentation. While GANs are exceptionally successful, those methods that can preserve objects ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 4

November 2019

322 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3376119

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2019

Accepted: 01 July 2019

Revised: 01 April 2019

Received: 01 November 2018

Published in TOMM Volume 15, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
407
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nie WChen RWang WLepri BSebe N(2025)T2TD: Text-3D Generation Model Based on Prior Knowledge GuidanceIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346375347:1(172-189)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TPAMI.2024.3463753
Gao XJiao CChen RWang WNie W(2025)Point‐PC: Point cloud completion guided by prior knowledge via causal inferenceCAAI Transactions on Intelligence Technology10.1049/cit2.12379Online publication date: 6-Jan-2025
https://doi.org/10.1049/cit2.12379
Rui LPeng Q(2024)Feature Skeletons-Based Model Retrieval for Bolus Shaping in Cancer Care2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612542(1-6)Online publication date: 25-Jun-2024
https://doi.org/10.23919/SpliTech61897.2024.10612542
Song DHuo SFu XZhang CLi WLiu A(2024)Cross-Modal Contrastive Learning with a Style-Mixed Bridge for Single Image 3D Shape RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368964520:12(1-24)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1145/3689645
Bansal GNawal AChamola VHerencsar N(2024)Revolutionizing Visuals: The Role of Generative AI in Modern Image GenerationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/368964120:11(1-22)Online publication date: 22-Aug-2024
https://dl.acm.org/doi/10.1145/3689641
Van Nguyen SLe STran MLe S(2024) Filling the Holes on 3D Heritage Object Surface Based on Automatic Segmentation Algorithm Expert Systems10.1111/exsy.13749Online publication date: 14-Oct-2024
https://doi.org/10.1111/exsy.13749
Daios AXanthopoulos AFolinas DKostavelis I(2024)Towards automating stocktaking in warehousesProcedia Computer Science10.1016/j.procs.2024.01.142232:C(1437-1445)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.procs.2024.01.142
Zhang SZhang NWang WLiu QLi J(2023)A Social Recommendation Model Based on Basic Spatial Mapping and Bilateral Generative Adversarial NetworksEntropy10.3390/e2510138825:10(1388)Online publication date: 28-Sep-2023
https://doi.org/10.3390/e25101388
Song DZhang CZhao XWang TNie WLi XLiu A(2023)Self-supervised Image-based 3D Model RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354869019:2(1-18)Online publication date: 23-Mar-2023
https://dl.acm.org/doi/10.1145/3548690
Hu NHuang XLi WLi XLiu A(2023)Cross-Domain Image-Object Retrieval Based on Weighted Optimal TransportIEEE Transactions on Multimedia10.1109/TMM.2023.325488925(9557-9571)Online publication date: 2023
https://doi.org/10.1109/TMM.2023.3254889
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents