skip to main content
research-article

HGAN: Holistic Generative Adversarial Networks for Two-dimensional Image-based Three-dimensional Object Retrieval

Published:16 December 2019Publication History
Skip Abstract Section

Abstract

In this article, we propose a novel method to address the two-dimensional (2D) image-based 3D object retrieval problem. First, we extract a set of virtual views to represent each 3D object. Then, a soft-attention model is utilized to find the weight of each view to select one characteristic view for each 3D object. Second, we propose a novel Holistic Generative Adversarial Network (HGAN) to solve the cross-domain feature representation problem and make the feature space of virtual characteristic view more inclined to the feature space of the real picture. This will effectively mitigate the distribution discrepancies across the 2D image domains and 3D object domains. Finally, we utilize the generative model of the HGAN to obtain the “virtual real image” of each 3D object and make the characteristic view of the 3D object and real picture possess the same feature space for retrieval. To demonstrate the performance of our approach, We established a new dataset that includes pairs of 2D images and 3D objects, where the 3D objects are based on the ModelNet40 dataset. The experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

References

  1. Hameed Abdul-Rashid, Juefei Yuan, Bo Li, and Lu et al. 2018. 2D image-based 3D scene retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Alex Telea, Theoharis Theoharis, and Remco Veltkamp (Eds.). The Eurographics Association. DOI:https://doi.org/10.2312/3dor.20181051Google ScholarGoogle Scholar
  2. Arasanathan Anjulan and Nishan Canagarajah. 2009. A Unified Framework for Object Retrieval and Mining. IEEE Press. 63--76 pages.Google ScholarGoogle Scholar
  3. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein GAN. (2017).Google ScholarGoogle Scholar
  4. Mathieu Aubry, Daniel Maturana, Alexei A. Efros, Bryan C. Russell, and Josef Sivic. 2014. Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models. In Computer Vision and Pattern Recognition. 3762--3769.Google ScholarGoogle Scholar
  5. Mathieu Aubry and Bryan C. Russell. 2015. Understanding deep features with computer-generated imagery. In Proceedings of the IEEE International Conference on Computer Vision. 2875--2883.Google ScholarGoogle Scholar
  6. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Comput. Sci. (2014).Google ScholarGoogle Scholar
  7. F. Bosche and C. T. Haas. 2008. Automated retrieval of 3D CAD model objects in construction range images. Autom. Constr. 17, 4 (2008), 499--512.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput. Sci. (2014).Google ScholarGoogle Scholar
  9. G. Dai, J. Xie, and Y. Fang. 2018. Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27, 7 (2018), 3374.Google ScholarGoogle ScholarCross RefCross Ref
  10. Thomas Funkhouser, Patrick Min, Michael Kazhdan, Joyce Chen, Alex Halderman, David Dobkin, and David Jacobs. 2003. A search engine for 3D models. ACM Trans. Graph. 22, 1 (2003), 83--105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Takahiko Furuya and Ryutarou Ohbuchi. 2014. Hashing cross-modal manifold for scalable sketch-based 3D model retrieval. In Proceedings of the International Conference on 3D Vision. 543--550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a predictable and generative vector representation for objects. In European Conference on Computer Vision. Springer, 484--499.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Neural Information Processing Systems. 2672--2680.Google ScholarGoogle Scholar
  14. Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. J. Mach. Learn. Res. 13, Mar (2012), 723--773.Google ScholarGoogle Scholar
  15. G. Guetat, M. Maitre, L. Joly, S. L. Lai, Tzumin Lee, and Y. Shinagawa. 2006. Automatic 3-D grayscale volume matching and shape analysis. IEEE Trans. Inf. Technol. Biomed. 10, 2 (2006), 362--376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi Man Vong, Yushen Liu, Matthias Zwicker, Junwei Han, and C. L. Philip Chen. 2019. SeqViews2SeqLabels: Learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28, 2 (2019), 658--672.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xinwei He, Yang Zhou, Zhichao Zhou, Song Bai, and Xiang Bai. 2018. Triplet-center loss for multi-view 3D object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1945--1954.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bo Li, Afzal Godil, Masaki Aono, X Bai, Takahiko Furuya, L Li, Roberto Javier López-Sastre, Henry Johan, Ryutarou Ohbuchi, Carolina Redondo-Cabrera, et al. 2012. SHREC’12 track: Generic 3D shape retrieval. In Proceedings of the Eurographics Conference on 3D Object Retrieval (3DOR’12), Vol. 6.Google ScholarGoogle Scholar
  20. Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Martin Burtscher, Hongbo Fu, Takahiko Furuya, Henry Johan, et al. 2014. SHREC’14 track: Extended large scale sketch-based 3D shape retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval, Vol. 2014, 121--130.Google ScholarGoogle Scholar
  21. Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohenor, and Leonidas J Guibas. 2015. Joint embeddings of shapes and images via CNN image purification. Int. Conf. Comput. Graph. Interact. Techn. 34, 6 (2015), 234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. An An Liu, Wei Zhi Nie, and Yu Ting Su. 2018. 3D object retrieval based on multi-view latent variable model. IEEE Trans. Circ. Syst. Vid. Technol. 29, 3 (2018), 868--880.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision. 2200--2207.Google ScholarGoogle Scholar
  24. Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S. Yu. 2014. Transfer joint matching for unsupervised domain adaptation. In Computer Vision and Pattern Recognition. 1410--1417.Google ScholarGoogle Scholar
  25. Francisco Massa, Bryan C. Russell, and Mathieu Aubry. 2016. Deep exemplar 2D-3D detection by adapting from real to rendered views. In Computer Vision and Pattern Recognition. 6024--6033.Google ScholarGoogle Scholar
  26. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. Comput. Sci. (2014), 2672--2680.Google ScholarGoogle Scholar
  27. Panpan Mu, Sanyuan Zhang, Yin Zhang, Xiuzi Ye, and Xiang Pan. 2018. Image-based 3D model retrieval using manifold learning. J. Zhejiang Univ. Sci. C 19, 11 (2018), 1397--1408.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang. 2011. Domain adaptation via transfer component analysis. IEEE Trans. Neur. Netw. 22, 2 (2011), 199--210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Panagiotis Papadakis, Ioannis Pratikakis, Stavros Perantonis, and Theoharis Theoharis. 2007. Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn. 40, 9 (2007), 2437--2452.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2536--2544.Google ScholarGoogle Scholar
  31. Yang Yu, Zhiqiang Gong, Ping Zhong, and Jiaxin Shan. 2017. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In International Conference on Image and Graphics. Springer, 97--108.Google ScholarGoogle ScholarCross RefCross Ref
  32. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tianjia Shao, Weiwei Xu, Kun Zhou, Jingdong Wang, Dongping Li, and Baining Guo. 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. Int. Conf. Comput. Graph. Interact. Techn. 31, 6 (2012), 136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kaleem Siddiqi, Juan Zhang, Diego Macrini, Ali Shokoufandeh, Sylvain Bouix, and Sven J. Dickinson. 2008. Retrieving articulated 3-D models using medial surfaces. Mach. Vis. Appl. 19, 4 (2008), 261--275.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  36. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  38. Atsushi Tatsuma, Hitoshi Koyanagi, and Masaki Aono. 2012. A large-scale shape benchmark for 3d object retrieval: Toyohashi shape benchmark. In Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 1--10.Google ScholarGoogle Scholar
  39. Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. (2016).Google ScholarGoogle Scholar
  40. Hau San Wong, Bo Ma, Zhiwen Yu, Pui Fong Yeung, and Horace H. S. Ip. 2007. 3-D head model retrieval using a single face view query. IEEE Trans. Multimedia 9, 5 (2007), 1026--1036.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Botong Wu, Qiang Yang, Wei Shi Zheng, Yizhou Wang, and Jingdong Wang. 2015. Quantized correlation hashing for fast cross-modal search. In Proceedings of the International Conference on Artificial Intelligence. 3946--3952.Google ScholarGoogle Scholar
  42. Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems. 82--90.Google ScholarGoogle Scholar
  43. Zhirong Wu, S. Song, A. Khosla, and Fisher Yu. 2014. 3D ShapeNets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google ScholarGoogle Scholar
  44. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jeng Sheng Yeh, Ding Yun Chen, Bing Yu Chen, and Ouhyoung Ming. 2005. A web-based three-dimensional protein retrieval system by matching visual similarity. Bioinformatics 21, 13 (2005), 3056.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhen Yi and Dit Yan Yeung. 2012. Co-regularized hashing for multimodal data. In Proceedings of the International Conference on Neural Information Processing Systems. 1376--1384.Google ScholarGoogle Scholar
  47. Z. Lin, G. Ding, J. Han, and J. Wang. 2017. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybernet. 47, 12 (2017), 4342--4355.Google ScholarGoogle ScholarCross RefCross Ref
  48. Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).Google ScholarGoogle Scholar
  49. Jing Zhang, Wanqing Li, and Philip Ogunbona. 2017. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1859--1867.Google ScholarGoogle ScholarCross RefCross Ref
  50. Fan Zhu, Jin Xie, and Yi Fang. 2016. Learning cross-domain neural networks for sketch-based 3D shape retrieval. In Proceedings of the Association for the Advancement of Artificial Intelligence Conference (AAAI’16).Google ScholarGoogle ScholarCross RefCross Ref
  51. Jing Zhu, John-Ross Rizzo, and Yi Fang. 2017. Learning domain-invariant feature for robust depth-image-based 3D shape retrieval. Pattern Recognition Letters (2017).Google ScholarGoogle Scholar
  52. Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.Google ScholarGoogle Scholar

Index Terms

  1. HGAN: Holistic Generative Adversarial Networks for Two-dimensional Image-based Three-dimensional Object Retrieval

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Multimedia Computing, Communications, and Applications
            ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 4
            November 2019
            322 pages
            ISSN:1551-6857
            EISSN:1551-6865
            DOI:10.1145/3376119
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 December 2019
            • Accepted: 1 July 2019
            • Revised: 1 April 2019
            • Received: 1 November 2018
            Published in tomm Volume 15, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format