Skip to main content

Learning Attentive and Hierarchical Representations for 3D Shape Recognition

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12360))

Included in the following conference series:

Abstract

This paper proposes a novel method for 3D shape representation learning, namely Hyperbolic Embedded Attentive Representation (HEAR). Different from existing multi-view based methods, HEAR develops a unified framework to address both multi-view redundancy and single-view incompleteness. Specifically, HEAR firstly employs a hybrid attention (HA) module, which consists of a view-agnostic attention (VAA) block and a view-specific attention (VSA) block. These two blocks jointly explore distinct but complementary spatial saliency of local features for each single-view image. Subsequently, a multi-granular view pooling (MVP) module is introduced to aggregate the multi-view features with different granularities in a coarse-to-fine manner. The resulting feature set implicitly has hierarchical relations, which are therefore projected into a Hyperbolic space by adopting the Hyperbolic embedding. A hierarchical representation is learned by Hyperbolic multi-class logistic regression based on the Hyperbolic geometry. Experimental results clearly show that HEAR outperforms the state-of-the-art approaches on three 3D shape recognition tasks including generic 3D shape retrieval, 3D shape classification and sketch-based 3D shape retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Jan Latecki, L.: GIFT: a real-time and scalable 3D shape search engine. In: CVPR (2016)

    Google Scholar 

  2. Bai, S., Zhou, Z., Wang, J., Bai, X., Jan Latecki, L., Tian, Q.: Ensemble diffusion for retrieval. In: ICCV (2017)

    Google Scholar 

  3. Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. In: NeurIPS (2016)

    Google Scholar 

  4. Chami, I., Ying, Z., Ré, C., Leskovec, J.: Hyperbolic graph convolutional neural networks. In: NeurIPS (2019)

    Google Scholar 

  5. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)

  6. Chen, D.Y., Tian, X.P., Shen, Y.T., Ouhyoung, M.: On visual similarity based 3D model retrieval. In: Computer Graphics Forum, vol. 22, pp. 223–232. Wiley Online Library (2003)

    Google Scholar 

  7. Chen, J., Fang, Y.: Deep cross-modality adaptation via semantics preserving adversarial learning for sketch-based 3D shape retrieval. In: ECCV (2018)

    Google Scholar 

  8. Chen, J., et al.: Deep sketch-shape hashing with segmented 3D stochastic viewing. In: CVPR (2019)

    Google Scholar 

  9. Dai, G., Xie, J., Fang, Y.: Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27, 3374–3386 (2018)

    Article  MathSciNet  Google Scholar 

  10. Dai, G., Xie, J., Fang, Y.: Siamese CNN-BiLSTM architecture for 3D shape representation learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 670–676 (2018)

    Google Scholar 

  11. Dai, G., Xie, J., Zhu, F., Fang, Y.: Deep correlated metric learning for sketch-based 3D shape retrieval. In: AAAI (2017)

    Google Scholar 

  12. Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: AAAI (2019)

    Google Scholar 

  13. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3D shape recognition. In: CVPR (2018)

    Google Scholar 

  14. Feng, Y., Feng, Y., You, H., Zhao, X., Gao, Y.: MeshNet: mesh neural network for 3D shape representation. In: AAAI 2019 (2018)

    Google Scholar 

  15. Furuya, T., Ohbuchi, R.: Ranking on cross-domain manifold for sketch-based 3D model retrieval. In: International Conference on Cyberworlds (2013)

    Google Scholar 

  16. Furuya, T., Ohbuchi, R.: Deep aggregation of local 3D geometric features for 3D model retrieval. In: BMVC (2016)

    Google Scholar 

  17. Bécigneul, G., Ganea, O.E.: Riemannian adaptive optimization methods (2019)

    Google Scholar 

  18. Gabeur, V., Franco, J.S., Martin, X., Schmid, C., Rogez, G.: Moulding humans: non-parametric 3D human shape estimation from single images. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  19. Gulcehre, C., et al.: Hyperbolic neural networks. In: NeurIPS (2018)

    Google Scholar 

  20. Gulcehre, C., et al.: Hyperbolic attention networks. In: ICLR (2019)

    Google Scholar 

  21. Han, Z., et al.: 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019)

    Article  MathSciNet  Google Scholar 

  22. Han, Z., et al.: SeqViews2SeqLabels: learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28(2), 658–672 (2018)

    Article  MathSciNet  Google Scholar 

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  24. He, X., Huang, T., Bai, S., Bai, X.: View n-gram network for 3D object retrieval. In: ICCV (2019)

    Google Scholar 

  25. He, X., Zhou, Y., Zhou, Z., Bai, S., Bai, X.: Triplet-center loss for multi-view 3D object retrieval. In: CVPR (2018)

    Google Scholar 

  26. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)

    Google Scholar 

  27. Johns, E., Leutenegger, S., Davision, A.J.: Pairwise decomposition of image sequences for active multiview recognition. In: CVPR (2016)

    Google Scholar 

  28. Kanezaki, A., Matsushita, Y., Nishida, Y.: RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: CVPR (2018)

    Google Scholar 

  29. Kazhdan, M., Funkhouser, T., Rusinkiewicz, S.: Rotation invariant spherical harmonic representation of 3D shape descriptors. In: Symposium on Geometry Processing, vol. 6, pp. 156–164 (2003)

    Google Scholar 

  30. Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., Lempitsky, V.: Hyperbolic image embeddings. arXiv preprint arXiv:1904.02239 (2019)

  31. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  32. Klokov, R., Lempitsky, V.: Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: CVPR (2017)

    Google Scholar 

  33. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  34. Kumawat, S., Raman, S.: LP-3DCNN: unveiling local phase in 3D convolutional neural networks. In: CVPR (2019)

    Google Scholar 

  35. Leng, B., Zhang, C., Zhou, X., Xu, C., Xu, K.: Learning discriminative 3D shape representations by view discerning networks. IEEE Trans. Visual. Comput. Graph. 25, 2896–2909 (2018)

    Article  Google Scholar 

  36. Li, B., et al.: SHREC13 track: large scale sketch-based 3D shape retrieval (2013)

    Google Scholar 

  37. Li, B., et al.: A comparison of methods for sketch-based 3D shape retrieval. CVIU 119, 57–80 (2014)

    Google Scholar 

  38. Li, B., et al.: SHREC14 track: extended large scale sketch-based 3D shape retrieval. In: Eurographics Workshop on 3D Object Retrieval (2014)

    Google Scholar 

  39. Li, J., Chen, B., Hee, L.G.: SO-Net: self-organizing network for point cloud analysis. In: CVPR (2018)

    Google Scholar 

  40. Liu, Y., Fan, B., Meng, G., Lu, J., Xiang, S., Pan, C.: DensePoint: learning densely contextual representation for efficient point cloud processing. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  41. Mao, J., Wang, X., Li, H.: Interpolated convolutional networks for 3D point cloud understanding. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  42. Maturana, D., Scherer, S.: Multi-view harmonized bilinear network for 3D object recognition. In: IROS (2015)

    Google Scholar 

  43. Phong, B.T.: Illumination for computer generated pictures. Commun. ACM 18(6), 311–317 (1975)

    Article  Google Scholar 

  44. Qi, A., Song, Y., Xiang, T.: Semantic embedding for sketch-based 3D shape retrieval. In: BMVC (2018)

    Google Scholar 

  45. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

    Google Scholar 

  46. Qi, C.R., Su, H., Niebner, M., Dai, A., Yan, M.: Volumetric and multi-view CNNs for object classification on 3D data. In: CVPR (2016)

    Google Scholar 

  47. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)

    Google Scholar 

  48. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  49. Sala, F., De Sa, C., Gu, A., R\(\acute{e}\), C.: Representation tradeoffs for hyperbolic embeddings. In: ICML (2019)

    Google Scholar 

  50. Sarkar, R.: Low distortion delaunay embedding of trees in hyperbolic plane. In: van Kreveld, M., Speckmann, B. (eds.) GD 2011. LNCS, vol. 7034, pp. 355–366. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25878-7_34

    Chapter  Google Scholar 

  51. Shi, B., Bai, S., Zhou, Z., Bai, X.: DeepPano: deep panoramic representation for 3D shape recognition. IEEE Signal Process. Lett. 22(12), 2339–2343 (2015)

    Article  Google Scholar 

  52. Shilane, P., Min, P., Kazhdan, M., Funkhouser, T.: The Princeton shape benchmark. In: Shape Modeling Applications (2004)

    Google Scholar 

  53. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  54. Sousa, P., Fonseca, M.J.: Sketch-based retrieval of drawings using spatial proximity. J. Vis. Lang. Comput. 21(2), 69–80 (2010)

    Article  Google Scholar 

  55. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: ICCV (2015)

    Google Scholar 

  56. Su, J.C., Gadelha, M., Wang, R., Maji, S.: A deeper look at 3D shape classifiers. In: ECCV (2018)

    Google Scholar 

  57. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI (2017)

    Google Scholar 

  58. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  59. Tabia, H., Laga, H.: Learning shape retrieval from different modalities. Neurocomputing 253, 24–33 (2017)

    Article  Google Scholar 

  60. Tasse, F.P., Dodgson, N.: Shape2Vec: semantic-based descriptors for 3D shapes, sketches and images. ACM Trans. Graph. 35(6), 208 (2016)

    Article  Google Scholar 

  61. Tatsuma, A., Koyanagi, H., Aono, M.: A large-scale shape benchmark for 3D object retrieval: Toyohashi shape benchmark. In: Asia-Pacific Signal & Information Processing Association Annual Summit and Conference (2012)

    Google Scholar 

  62. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  63. Wang, C., Li, H., Zhao, D.: Preconditioning Toeplitz-plus-diagonal linear systems using the Sherman-Morrison-Woodbury formula. J. Comput. Appl. Math. 309, 312–319 (2017)

    Article  MathSciNet  Google Scholar 

  64. Wang, C., Li, H., Zhao, D.: Improved block preconditioners for linear systems arising from half-quadratic image restoration. Appl. Math. Comput. 363, 124614 (2019)

    MathSciNet  MATH  Google Scholar 

  65. Wang, C., Pelillo, M., Siddiqi, K.: Dominant set clustering and pooling for multi-view 3D object recognition. In: BMVC (2017)

    Google Scholar 

  66. Wang, F., Kang, L., Li, Y.: Sketch-based 3D shape retrieval using convolutional neural networks. In: CVPR (2015)

    Google Scholar 

  67. Wu, Z., et al.: RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: CVPR (2015)

    Google Scholar 

  68. Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR (2015)

    Google Scholar 

  69. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)

    Google Scholar 

  70. Xie, J., Dai, G., Zhu, F., Fang, Y.: Learning barycentric representations of 3D shapes for sketch-based 3D shape retrieval. In: CVPR (2017)

    Google Scholar 

  71. Xu, C., Li, Z., Qiu, Q., Leng, B., Jiang, J.: Enhancing 2D representation via adjacent views for 3D shape retrieval. In: ICCV (2019)

    Google Scholar 

  72. Xu, L., Sun, H., Liu, Y.: Learning with batch-wise optimal transport loss for 3D shape recognition. In: CVPR (2019)

    Google Scholar 

  73. Yang, Z., Wang, L.: Learning relationships for multi-view 3D object recognition. In: ICCV (2019)

    Google Scholar 

  74. Yasseen, Z., Verroust-Blondet, A., Nasri, A.: View selection for sketch-based 3D model retrieval using visual part shape description. Vis. Comput. 33(5), 565–583 (2017)

    Article  Google Scholar 

  75. Yu, T., Meng, J., Yuan, J.: Multi-view harmonized bilinear network for 3D object recognition. In: CVPR (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Qin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J., Qin, J., Shen, Y., Liu, L., Zhu, F., Shao, L. (2020). Learning Attentive and Hierarchical Representations for 3D Shape Recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12360. Springer, Cham. https://doi.org/10.1007/978-3-030-58555-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58555-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58554-9

  • Online ISBN: 978-3-030-58555-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics