skip to main content
10.1145/3397271.3401054acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Pairwise View Weighted Graph Network for View-based 3D Model Retrieval

Published: 25 July 2020 Publication History

Abstract

View-based 3D model retrieval has become an important task in both computer vision and machine learning domains. Although deep learning methods have achieved excellent performances on view-based 3D model retrieval, the intrinsic correlation and the degree of view discrimination among multiple views in a 3D model have not been effectively exploited. To obtain a more efficient feature descriptor for 3D model retrieval, in this work, we propose the pairwise view weighted graph network (abbreviated PVWGN) for view-based 3D model retrieval where non-local graph layers are embedded into the network architecture to automatically mine the intrinsic relationship among multiple views of a 3D model. Furthermore, the view weighted layer is employed in the PVWGN to adaptively assign the weight to each view according to its aggregation information. In addition, the pairwise discrimination loss function is designed to improve the feature discrimination of the 3D model. Most importantly, these three issues are integrated into a unified framework. Extensive experimental results on the ModelNet40 and ModelNet10 3D model retrieval datasets show that PVWGN can outperform all state-of-the-art methods on the 3D model retrieval task with mAPs of 93.2% and 96.2%, respectively.

Supplementary Material

MP4 File (3397271.3401054.mp4)
To obtain a more efficient feature descriptor for 3D model retrieval, in this work, a novel pairwise view weighted graph network (abbreviated PVWGN) is proposed for view-based 3D model retrieval. Specifically, non-local graph layers are first embedded into the network architecture to automatically mine the intrinsic relationship among multiple views of a 3D model. Then, the view weighted layer is employed in the PVWGN to adaptively assign the weight to each view according to its aggregation information. Finally, the pairwise discrimination loss function is designed to improve the feature discrimination of the 3D model. Most importantly, these three issues are integrated into a unified end-to-end framework. Extensive experimental results on the ModelNet40 and ModelNet10 3D model retrieval datasets show that PVWGN can outperform all state-of-the-art methods on the 3D model retrieval task with mAPs of 93.2% and 96.2%, respectively.

References

[1]
Song Bai, Xiang Bai, and Zhichao et al. Zhou. 2016. GIFT: A Real-Time and Scalable 3D Shape Search Engine. In IEEE Conference on Computer Vision and Pattern Recognition. 5023--5032.
[2]
Peter Battaglia, Jessica B Hamrick, and Victor et al. Bapst. 2018. Relational inductive biases, deep learning, and graph networks. arXiv: Learning (2018).
[3]
Dingyun Chen, Xiaopei Tian, Yute Shen, and Ouhyoung Ming. 2010. On Visual Similarity Based 3D Model Retrieval. Computer Graphics Forum, Vol. 22, 3 (2010), 223--232.
[4]
Yifan Feng, Haoxuan You, and Zizhao et al. Zhang. 2019. Hypergraph Neural Networks. In The Thirty-Second AAAI Conference on Artificial Intelligence. 1--8.
[5]
Yifan Feng, Zizhao Zhang, and Xibin et al. Zhao. 2018. GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 264--272.
[6]
Yue Gao, Qionghai Dai, Meng Wang, and Naiyao Zhang. 2011. 3D model retrieval using weighted bipartite graph matching. Signal Processing Image Communication, Vol. 26, 1 (2011), 39--47.
[7]
Yue Gao, Jinhui Tang, Richang Hong, Shuicheng Yan, Qionghai Dai, Naiyao Zhang, and Tat Seng Chua. 2012. Camera Constraint-Free View-Based 3-D Object Retrieval. IEEE Transactions on Image Processing, Vol. 21, 4 (2012), 2269--2281.
[8]
Zan Gao, Xue Kaixin, and Shaohua Wan. 2020 a. Multiple Discrimination and Pairwise CNN for View-based 3D Object Retrieval. Neural Networks, Vol. 125, 1 (2020), 290--302.
[9]
Zan Gao, Yinming Li, and Shaohua Wan. 2020 b. Exploring Deep Learning for View-Based 3D Model Retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 16, 1 (2020), 1--21.
[10]
Zan Gao, Deyu Wang, Xiangnan He, and Hua Zhang. 2018. Group-Pair Convolutional Neural Networks for Multi-View based 3D Object Retrieval. In The Thirty-Second AAAI Conference on Artificial Intelligence. 1--8.
[11]
Zhizhong Han, Honglei Lu, and Zhenbao et al. Liu. 2019 a. 3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN with Hierarchical Attention Aggregation. IEEE Transactions on Image Processing (2019), 1--8.
[12]
Zhizhong Han, Mingyang Shang, Yushen Liu, and Matthias Zwicker. 2019 c. View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions. In The Thirty-Second AAAI Conference on Artificial Intelligence. 1--8.
[13]
Zhizhong Han, Mingyang Shang, and Zhenbao et al. Liu. 2019 b. SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention. IEEE Transactions on Image Processing, Vol. 28, 2 (2019), 658--672.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[15]
Xinwei He, Yang Zhou, and Zhichao et al. Zhou. 2018. Triplet-Center Loss for Multi-view 3D Object Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 1945--1954.
[16]
Pengyu Zhao et al. Jun Xiao, Yuanxing Zhang. [n.d.]. Multi-view Moments Embedding Network for 3D Shape Recognition. In ACM International Conference on Information Knowledge Management.
[17]
Rieko Kadobayashi and Katsumi Tanaka. 2005. 3D Viewpoint-Based Photo Search and Information Browsing. In International ACM SIGIR Conference on Research and Development in Information Retrieval. 621--622.
[18]
Alireza Khotanzad and Yaw Hua Hong. 1990. Invariant image recognition by Zernike moments. IEEE Transactions on pattern analysis and machine intelligence, Vol. 12, 5 (1990), 489--497.
[19]
Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).
[20]
L. Kobbelt, P. Schrder, Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. 2003. Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors. Eurographics Symposium on Geometry Processing, Vol. 43, 2 (2003), 156--164.
[21]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In International Conference on Neural Information Processing Systems, Vol. 141. 1097--1105.
[22]
An-An Liu, Wei-Zhi Nie, Yue Gao, and Yu-Ting Su. 2017. View-based 3-D model retrieval: a benchmark. IEEE transactions on cybernetics, Vol. 48, 3 (2017), 916--928.
[23]
Siqi Liu, Sidong Liu, and Weidong et al. Cai. 2014. High-level feature based PET image retrieval with deep learning architecture. Journal of Nuclear Medicine, Vol. 55 (2014), 2028--2028.
[24]
Chao Ma, Yulan Guo, Jungang Yang, and Wei An. 2019. Learning Multi-View Representation With LSTM for 3-D Shape Recognition and Retrieval. IEEE Transactions on Multimedia, Vol. 21, 5 (2019), 945--953.
[25]
Athanasios Mademlis, Petros Daras, Dimitrios Tzovaras, and Michael G. Strintzis. 2009. 3D object retrieval using the 3D shape impact descriptor. Pattern Recognition, Vol. 42, 11 (2009), 2447--2459.
[26]
Ryutarou Ohbuchi, Kunio Osada, Takahiko Furuya, and Tomohisa Banno. 2008. Salient local visual features for shape-based 3D model retrieval. In IEEE International Conference on Shape Modeling and Applications. 93--102.
[27]
Albrecht Rothermel. 2012. Similarity search in 3D object-based video data. In ACM International Conference on Information Knowledge Management.
[28]
Adam Santoro, David Raposo, and David G T et al. Barrett. 2017. A simple neural network module for relational reasoning. In IEEE Conference on Neural Information Processing Systems. 4967--4976.
[29]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In IEEE Conference on Computer Vision and Pattern Recognition. 815--823.
[30]
Konstantinos Sfikas, Ioannis Pratikakis, and Theoharis Theoharis. 2018. Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval. Computers & Graphics, Vol. 71 (2018), 208--218.
[31]
K. Sfikas, T. Theoharis, and I. Pratikakis. 2017. Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval. In Eurographics Workshop on 3D Object Retrieval, Vol. 6. 7.
[32]
Baoguang Shi, Song Bai, Zhichao Zhou, and Xiang Bai. 2015. DeepPano: Deep Panoramic Representation for 3-D Shape Recognition. IEEE Signal Processing Letters, Vol. 22, 12 (2015), 2339--2343.
[33]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).
[34]
Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep Learning 3D Shape Surfaces Using Geometry Images. In European Conference on Computer Vision. 223--240.
[35]
Richard Socher, Brody Huval, and Bharath et al. Bath. 2012. Convolutional-recursive deep learning for 3d object classification. In Advances in Neural Information Processing Systems. 656--664.
[36]
M Steinbach. 2000. A comparison of document clustering techniques. In International KDD Workshop on Text Mining. 1--8.
[37]
Hang Su, Subhransu Maji, and Kalogerakis et al. 2015. Multi-view convolutional neural networks for 3d shape recognition. In IEEE International Conference on Computer Vision. 945--953.
[38]
Filali Ansary Tarik, Daoudi Mohamed, and Philippe Vandeborre Jean. 2006. A Bayesian 3-D Search Engine Using Adaptive Views Clustering. IEEE Transactions on Multimedia, Vol. 9, 1 (2006), 78--88.
[39]
Petar Velickovic, Guillem Cucurull, and Arantxa et al. Casanova. 2018. Graph Attention Networks. In International Conference on Learning Representations.
[40]
Xiaolong Wang, Ross B Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 7794--7803.
[41]
Xinshao Wang, Yang Hua, and Elyor et al. Kodirov. 2019. Ranked List Loss for Deep Metric Learning. In IEEE Conference on Computer Vision and Pattern Recognition.
[42]
Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Qiao Yu. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition .Springer International Publishing.
[43]
Raoul Wessel, Sebastian Ochmann, and Richard et al. Vock. 2011. Efficient Retrieval of 3D Building Models Using Embeddings of Attributed Subgraphs. In ACM International Conference on Information and Knowledge Management. 2097--2100.
[44]
Zhirong Wu, S. Song, A. Khosla, and Fisher Yu. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.
[45]
Zahraa Yasseen, Anne Verroust-Blondet, and Ahmad Nasri. 2016. View selection for sketch-based 3D model retrieval using visual part shape description. Visual Computer (2016), 1--19.
[46]
Mohsen Yavartanoo, Eu Young Kim, and Kyoung Mu Lee. 2018. SPNet: Deep 3D Object Classification and Retrieval Using Stereographic Projection. In Asian Conference on Computer Vision. 691--706.
[47]
Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition. In ACM International Conference on Multimedia. 1310--1318.
[48]
Yujie Zhong, Relja Arandjelovic, and Andrew Zisserman. 2018. GhostVLAD for set-based face recognition. In IEEE Conference on Computer Vision and Pattern Recognition.

Cited By

View all
  • (2024)MulmQA: Multimodal Question Answering for Database Alarm2024 5th Information Communication Technologies Conference (ICTC)10.1109/ICTC61510.2024.10602092(291-296)Online publication date: 10-May-2024
  • (2022)Dynamic Graph Reasoning for Conversational Open-Domain Question AnsweringACM Transactions on Information Systems10.1145/349855740:4(1-24)Online publication date: 11-Jan-2022
  • (2021)Collocation and Try-on NetworkProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475691(309-317)Online publication date: 17-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. non-local graph network
  2. pairwise network architecture
  3. view weighted layer
  4. view-based 3D model retrieval

Qualifiers

  • Research-article

Funding Sources

  • Tianjin New Generation Artificial Intelligence Major Program
  • Young creative team in universities of Shandong Province
  • NSFC
  • National Key R\&D Program of China
  • Jinan 20 projects in universities

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MulmQA: Multimodal Question Answering for Database Alarm2024 5th Information Communication Technologies Conference (ICTC)10.1109/ICTC61510.2024.10602092(291-296)Online publication date: 10-May-2024
  • (2022)Dynamic Graph Reasoning for Conversational Open-Domain Question AnsweringACM Transactions on Information Systems10.1145/349855740:4(1-24)Online publication date: 11-Jan-2022
  • (2021)Collocation and Try-on NetworkProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475691(309-317)Online publication date: 17-Oct-2021
  • (2021)Multimodal Dialog System: Relational Graph-based Context-aware Question UnderstandingProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475234(695-703)Online publication date: 17-Oct-2021
  • (2021)A Numerical Comparison of Iterative Algorithms for Inconsistency Reduction in Pairwise ComparisonsIEEE Access10.1109/ACCESS.2021.30742749(62553-62561)Online publication date: 2021
  • (2020)Texture Semantically Aligned with Visibility-aware for Partial Person Re-identificationProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413833(3771-3779)Online publication date: 12-Oct-2020
  • (2020)Domain-Specific Alignment Network for Multi-Domain Image-Based 3D Object RetrievalProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413655(3496-3504)Online publication date: 12-Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media