Off-the-shelf CNN features for 3D object retrieval

Wang, Dong; Wang, Bin; Zhao, Sicheng; Yao, Hongxun; Liu, Hong

doi:10.1007/s11042-017-5413-3

Off-the-shelf CNN features for 3D object retrieval

Published: 27 November 2017

Volume 77, pages 19833–19849, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dong Wang¹,
Bin Wang¹,
Sicheng Zhao¹,
Hongxun Yao¹ &
…
Hong Liu¹

501 Accesses
2 Citations
Explore all metrics

Abstract

Effective feature representation is crucial to view-based 3D object retrieval (V3OR). Most previous works employed hand-crafted features to represent the views of each object. Although deep learning based methods has shown its excellent performance in many vision tasks, it is hard to get excellent performance for unsupervised 3D object retrieval. In this paper, we propose to combine the off-the-shelf deep model and graph model to retrieve unseen objects. By employing the powerful deep classification models which are trained from millions of images, we obtain significant improvements compared with state of the art methods. We validate the effectiveness of the ready CNN from other domains that can greatly facilitate the representative ability of objects’ views. In addition, we analyze the representative abilities of different fully connected layers for V3OR, and propose to employ multigraph learning to fuse the deep features of different layers. The autoencoder is then explored to improve the retrieval speed to a large extent. Experiments on two popular datasets are carried out to demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Object retrieval based on non-local graph neural networks

Article 11 July 2020

Yin-min Li, Zan Gao, … Yan-bing Xue

Center-push loss for joint view-based 3D model classification and retrieval feature learning

Article 22 May 2021

Dong Wang, Bin Wang, … Hong Liu

3D object retrieval based on multi-view convolutional neural networks

Article 30 January 2017

Xi-Xi Li, Qun Cao & Sha Wei

References

Ansary TF, Daoudi M, Vandeborre JP (2007) A bayesian 3-d search engine using adaptive views clustering. IEEE Trans Multimed 9(1):78–88
Article Google Scholar
Bai S, Bai X, Zhou Z, Zhang Z, Jan Latecki L (2016) Gift: A real-time and scalable 3d shape search engine. In: Proceedings of the Computer Vision and Pattern Recognition, pp 5023–5032
Chen DY, Tian XP, Shen YT, Ouhyoung M (2003) On visual similarity based 3d model retrieval. In: Computer Graphics Forum, vol 22. Wiley Online Library, pp 223–232
Daras P, Axenopoulos A (2010) A 3d shape retrieval framework supporting multimodal queries. Int J Comput Vis 89(2-3):229–247
Article Google Scholar
Furuya T, Ohbuchi R (2009) Dense sampling and fast encoding for 3d model retrieval using bag-of-visual features. In: Proceedings of ACM Conference on image and video retrieval. ACM, p 26
Gao Y, Dai Q (2014) View-based 3-d object retrieval: challenges and approaches. IEEE Trans Multimedia
Gao Y, Yang Y, Dai Q, Zhang N (2010) 3d object retrieval with bag-of-region-words. In: International Conference on Multimedia. ACM, pp 955–958
Gao Y, Wang M, Zha ZJ, Tian Q, Dai Q, Zhang N (2011) Less is more: efficient 3-d object retrieval with query view selection. IEEE Trans Multimed 13 (5):1007–1018
Article Google Scholar
Gao Y, Dai Q, Wang M, Zhang N (2011) 3d model retrieval using weighted bipartite graph matching. Signal Process Image Commun 26(1):39–47
Article Google Scholar
Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21(9):4290–4303
Article MathSciNet MATH Google Scholar
Gao Y, Wang M, Ji R, Wu X, Dai Q (2014) 3d object retrieval with hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the Computer Vision and Pattern Recognition, pp 770–778
Jayanti S, Kalyanaraman Y, Iyer N, Ramani K (2006) Developing an engineering shape benchmark for cad models. Comput Aid Des 38(9):939–953
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. arXiv:http://arXiv.org/abs/1408.5093
Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans Pattern Anal Mach Intell 12(5):489–497
Article Google Scholar
Krause J, Gebru T, Deng J, Li LJ, Fei-Fei L (2014) Learning features and parts for fine-grained recognition. In: Proceedings of International Conference on Pattern Recognition. IEEE, pp 26–33
Leibe B, Schiele B (2003) Analyzing appearance and contour based methods for object categorization. In: Proceedings of the Computer Vision and Pattern Recognition, vol 2, pp II–409
Li Y, Paluri M, Rehg JM, Dollár P (2016) Unsupervised learning of edges. In: Proceedings of the Computer Vision and Pattern Recognition
Liu DC, Nocedal J (1989) On the limited memory bfgs method for large scale optimization. Math Programm 45(1):503–528
Article MathSciNet MATH Google Scholar
Liu Y, Wang X, Wang HY, Zha H, Qin H (2010) Learning robust similarity measures for 3d partial shape retrieval. Int J Comput Vis 89(2-3):408–431
Article Google Scholar
Liu Q, Yang Y, Ji R, Gao Y, Yu L (2012) Cross-view down/up-sampling method for multiview depth video coding. IEEE Signal Process Lett 19(5):295–298
Article Google Scholar
Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Article MathSciNet Google Scholar
Mahmoudi S, Daoudi M (2002) 3d models retrieval by using characteristic views. In: Proceedings of the International Conference on Pattern Recognition, vol 2. IEEE, pp 457–460
Massa F, Russell BC, Aubry M (2016) Deep exemplar 2d-3d detection by adapting from real to rendered views. In: Proceedings of the Computer Vision and Pattern Recognition, pp 6024–6033
Nie W, Cao Q, Liu A et al. (2017) Convolutional deep learning for 3d object retrieval[J]. Multimed Syst 23(3):325–332
Article Google Scholar
Ohbuchi R, Osada K, Furuya T, Banno T (2008) Salient local visual features for shape-based 3d model retrieval. In: IEEE Conference on Shape Modeling and Applications. IEEE, pp 93–102
Papadakis P, Pratikakis I, Theoharis T, Perantonis S (2010) Panorama: A 3d shape descriptor based on panoramic views for unsupervised 3d object retrieval. Int J Comput Vis 89(2-3):177–192
Article Google Scholar
Qi Y, Zhang S, Qin L, Yao H, Huang Q, Yang JLMH Hedged deep tracking
Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the Computer Vision and Pattern Recognition. Workshop, pp 806–813
Savva M, Yu F, Su H, Aono M, Chen B, Cohen-Or D, Deng W, Su H, Bai S, Bai X et al (2016) Shrec’16 track large-scale 3d shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval
Savva M, Yu F, Su H, Kanezaki A, Furuya T, Ohbuchi R, Zhou Z, Yu R, Bai S, Bai X, Aono M, Tatsuma A, Thermos S, Axenopoulos A, Papadopoulos GT, Daras P, Deng X, Lian Z, Li B, Johan H, Y L, Mk S (2017) Shrec’17 track large-scale 3d shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:http://arXiv.org/abs/1409.1556
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of International Conference on Computer Vision, pp 945–953
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the Computer Vision and Pattern Recognition, pp 1–9
Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: A comprehensive study. In: Proceedings of ACM Conference on Multimedia. ACM, pp 157–166
Yang Lj, Zhang Bh, Ye Xz (2004) Fast fourier transform and its applications. Opto-Electron Eng 31:1–7
Google Scholar
Yap PT, Paramesran R, Ong SH (2003) Image analysis by krawtchouk moments. IEEE Trans Image Process 12(11):1367–1377
Article MathSciNet Google Scholar
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the Computer Vision and Pattern Recognition, pp 1265–1274
Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3d depth data matching towards robust action retrieval. Neurocomputing 151:533–543
Article Google Scholar
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118
Article Google Scholar

Download references

Acknowledgments

This work is supported by State Key Development Program of Basic Research of China (973 Program) (No. 2013CB733105), the National Natural Science Foundation of China (No. 61472103) and Key Program (No. 61133003).

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, China
Dong Wang, Bin Wang, Sicheng Zhao, Hongxun Yao & Hong Liu

Authors

Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sicheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hongxun Yao
View author publications
You can also search for this author in PubMed Google Scholar
Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Wang.

Appendix: The experimental results with CaffeNet

Figure 5 presents the performance with CaffeNet of five FC layers (include the ’ReLU’ layers) and the output with softmax layer for 1000 categories in ILSVRC. From the results of both dataset, we can see that the 22th layer (the output for classification) performs rather poor for V3OR, which is similar to the VggNet. In addition, the other 5 FC layer features show indistinct performance variance, thus there is no prominent layer that can be considered as the representative view feature.

As what it has done with VggNet, we employ the multi-graph learning method to fuse the FC layers of CaffeNet. Single graph learning scores and the final fusion results are presented in Tables 6 and 7.

Table 6 Performance comparison of different methods for several measures on the ETH dataset

Full size table

Table 7 Performance comparison of different methods for several measures on the NTU dataset

Full size table

It is interesting to see that the fusion results on ETH dataset do not obtained the highest score on all the indicators, what is the similar case with VggNet, and we would like to give similar explanation as in Section 3.2 of the paper.

Figure 6 demonstrate the similar trend of dimensionality reduction with VggNet. What’s more, the reduced very low dimension of CaffeNet feature (about 8) has comparable performance with the original feature.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, D., Wang, B., Zhao, S. et al. Off-the-shelf CNN features for 3D object retrieval. Multimed Tools Appl 77, 19833–19849 (2018). https://doi.org/10.1007/s11042-017-5413-3

Download citation

Received: 02 November 2016
Revised: 06 November 2017
Accepted: 09 November 2017
Published: 27 November 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11042-017-5413-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Off-the-shelf CNN features for 3D object retrieval

Abstract

Access this article

Similar content being viewed by others

3D Object retrieval based on non-local graph neural networks

Center-push loss for joint view-based 3D model classification and retrieval feature learning

3D object retrieval based on multi-view convolutional neural networks

References

Acknowledgments