Skip to main content
Log in

Off-the-shelf CNN features for 3D object retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Effective feature representation is crucial to view-based 3D object retrieval (V3OR). Most previous works employed hand-crafted features to represent the views of each object. Although deep learning based methods has shown its excellent performance in many vision tasks, it is hard to get excellent performance for unsupervised 3D object retrieval. In this paper, we propose to combine the off-the-shelf deep model and graph model to retrieve unseen objects. By employing the powerful deep classification models which are trained from millions of images, we obtain significant improvements compared with state of the art methods. We validate the effectiveness of the ready CNN from other domains that can greatly facilitate the representative ability of objects’ views. In addition, we analyze the representative abilities of different fully connected layers for V3OR, and propose to employ multigraph learning to fuse the deep features of different layers. The autoencoder is then explored to improve the retrieval speed to a large extent. Experiments on two popular datasets are carried out to demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ansary TF, Daoudi M, Vandeborre JP (2007) A bayesian 3-d search engine using adaptive views clustering. IEEE Trans Multimed 9(1):78–88

    Article  Google Scholar 

  2. Bai S, Bai X, Zhou Z, Zhang Z, Jan Latecki L (2016) Gift: A real-time and scalable 3d shape search engine. In: Proceedings of the Computer Vision and Pattern Recognition, pp 5023–5032

  3. Chen DY, Tian XP, Shen YT, Ouhyoung M (2003) On visual similarity based 3d model retrieval. In: Computer Graphics Forum, vol 22. Wiley Online Library, pp 223–232

  4. Daras P, Axenopoulos A (2010) A 3d shape retrieval framework supporting multimodal queries. Int J Comput Vis 89(2-3):229–247

    Article  Google Scholar 

  5. Furuya T, Ohbuchi R (2009) Dense sampling and fast encoding for 3d model retrieval using bag-of-visual features. In: Proceedings of ACM Conference on image and video retrieval. ACM, p 26

  6. Gao Y, Dai Q (2014) View-based 3-d object retrieval: challenges and approaches. IEEE Trans Multimedia

  7. Gao Y, Yang Y, Dai Q, Zhang N (2010) 3d object retrieval with bag-of-region-words. In: International Conference on Multimedia. ACM, pp 955–958

  8. Gao Y, Wang M, Zha ZJ, Tian Q, Dai Q, Zhang N (2011) Less is more: efficient 3-d object retrieval with query view selection. IEEE Trans Multimed 13 (5):1007–1018

    Article  Google Scholar 

  9. Gao Y, Dai Q, Wang M, Zhang N (2011) 3d model retrieval using weighted bipartite graph matching. Signal Process Image Commun 26(1):39–47

    Article  Google Scholar 

  10. Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21(9):4290–4303

    Article  MathSciNet  MATH  Google Scholar 

  11. Gao Y, Wang M, Ji R, Wu X, Dai Q (2014) 3d object retrieval with hausdorff distance learning. IEEE Trans Ind Electron 61(4):2088–2098

    Article  Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the Computer Vision and Pattern Recognition, pp 770–778

  13. Jayanti S, Kalyanaraman Y, Iyer N, Ramani K (2006) Developing an engineering shape benchmark for cad models. Comput Aid Des 38(9):939–953

    Article  Google Scholar 

  14. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. arXiv:http://arXiv.org/abs/1408.5093

  15. Khotanzad A, Hong YH (1990) Invariant image recognition by zernike moments. IEEE Trans Pattern Anal Mach Intell 12(5):489–497

    Article  Google Scholar 

  16. Krause J, Gebru T, Deng J, Li LJ, Fei-Fei L (2014) Learning features and parts for fine-grained recognition. In: Proceedings of International Conference on Pattern Recognition. IEEE, pp 26–33

  17. Leibe B, Schiele B (2003) Analyzing appearance and contour based methods for object categorization. In: Proceedings of the Computer Vision and Pattern Recognition, vol 2, pp II–409

  18. Li Y, Paluri M, Rehg JM, Dollár P (2016) Unsupervised learning of edges. In: Proceedings of the Computer Vision and Pattern Recognition

  19. Liu DC, Nocedal J (1989) On the limited memory bfgs method for large scale optimization. Math Programm 45(1):503–528

    Article  MathSciNet  MATH  Google Scholar 

  20. Liu Y, Wang X, Wang HY, Zha H, Qin H (2010) Learning robust similarity measures for 3d partial shape retrieval. Int J Comput Vis 89(2-3):408–431

    Article  Google Scholar 

  21. Liu Q, Yang Y, Ji R, Gao Y, Yu L (2012) Cross-view down/up-sampling method for multiview depth video coding. IEEE Signal Process Lett 19(5):295–298

    Article  Google Scholar 

  22. Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  23. Mahmoudi S, Daoudi M (2002) 3d models retrieval by using characteristic views. In: Proceedings of the International Conference on Pattern Recognition, vol 2. IEEE, pp 457–460

  24. Massa F, Russell BC, Aubry M (2016) Deep exemplar 2d-3d detection by adapting from real to rendered views. In: Proceedings of the Computer Vision and Pattern Recognition, pp 6024–6033

  25. Nie W, Cao Q, Liu A et al. (2017) Convolutional deep learning for 3d object retrieval[J]. Multimed Syst 23(3):325–332

    Article  Google Scholar 

  26. Ohbuchi R, Osada K, Furuya T, Banno T (2008) Salient local visual features for shape-based 3d model retrieval. In: IEEE Conference on Shape Modeling and Applications. IEEE, pp 93–102

  27. Papadakis P, Pratikakis I, Theoharis T, Perantonis S (2010) Panorama: A 3d shape descriptor based on panoramic views for unsupervised 3d object retrieval. Int J Comput Vis 89(2-3):177–192

    Article  Google Scholar 

  28. Qi Y, Zhang S, Qin L, Yao H, Huang Q, Yang JLMH Hedged deep tracking

  29. Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the Computer Vision and Pattern Recognition. Workshop, pp 806–813

  30. Savva M, Yu F, Su H, Aono M, Chen B, Cohen-Or D, Deng W, Su H, Bai S, Bai X et al (2016) Shrec’16 track large-scale 3d shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval

  31. Savva M, Yu F, Su H, Kanezaki A, Furuya T, Ohbuchi R, Zhou Z, Yu R, Bai S, Bai X, Aono M, Tatsuma A, Thermos S, Axenopoulos A, Papadopoulos GT, Daras P, Deng X, Lian Z, Li B, Johan H, Y L, Mk S (2017) Shrec’17 track large-scale 3d shape retrieval from shapenet core55. In: Proceedings of the Eurographics Workshop on 3D Object Retrieval

  32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:http://arXiv.org/abs/1409.1556

  33. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of International Conference on Computer Vision, pp 945–953

  34. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the Computer Vision and Pattern Recognition, pp 1–9

  35. Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: A comprehensive study. In: Proceedings of ACM Conference on Multimedia. ACM, pp 157–166

  36. Yang Lj, Zhang Bh, Ye Xz (2004) Fast fourier transform and its applications. Opto-Electron Eng 31:1–7

    Google Scholar 

  37. Yap PT, Paramesran R, Ong SH (2003) Image analysis by krawtchouk moments. IEEE Trans Image Process 12(11):1367–1377

    Article  MathSciNet  Google Scholar 

  38. Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: Proceedings of the Computer Vision and Pattern Recognition, pp 1265–1274

  39. Zhao S, Chen L, Yao H, Zhang Y, Sun X (2015) Strategy for dynamic 3d depth data matching towards robust action retrieval. Neurocomputing 151:533–543

    Article  Google Scholar 

  40. Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by State Key Development Program of Basic Research of China (973 Program) (No. 2013CB733105), the National Natural Science Foundation of China (No. 61472103) and Key Program (No. 61133003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Wang.

Appendix: The experimental results with CaffeNet

Appendix: The experimental results with CaffeNet

Figure 5 presents the performance with CaffeNet of five FC layers (include the ’ReLU’ layers) and the output with softmax layer for 1000 categories in ILSVRC. From the results of both dataset, we can see that the 22th layer (the output for classification) performs rather poor for V3OR, which is similar to the VggNet. In addition, the other 5 FC layer features show indistinct performance variance, thus there is no prominent layer that can be considered as the representative view feature.

Fig. 5
figure 5

Performance with various layers: a @ETH, b @NTU

As what it has done with VggNet, we employ the multi-graph learning method to fuse the FC layers of CaffeNet. Single graph learning scores and the final fusion results are presented in Tables 6 and 7.

Table 6 Performance comparison of different methods for several measures on the ETH dataset
Table 7 Performance comparison of different methods for several measures on the NTU dataset

It is interesting to see that the fusion results on ETH dataset do not obtained the highest score on all the indicators, what is the similar case with VggNet, and we would like to give similar explanation as in Section 3.2 of the paper.

Figure 6 demonstrate the similar trend of dimensionality reduction with VggNet. What’s more, the reduced very low dimension of CaffeNet feature (about 8) has comparable performance with the original feature.

Fig. 6
figure 6

Performance via feature dimensionality reduction: a @ETH. b @NTU

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D., Wang, B., Zhao, S. et al. Off-the-shelf CNN features for 3D object retrieval. Multimed Tools Appl 77, 19833–19849 (2018). https://doi.org/10.1007/s11042-017-5413-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5413-3

Keywords

Navigation