Skip to main content
Log in

Sequential learning for sketch-based 3D model retrieval

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Sketch-based 3D model retrieval suffers large visual discrepancy between 3D models and 2D sketches. Most existing methods directly project samples from both modalities into a same semantic embedding space to alleviate the discrepancy. We argue that simultaneous learning of those two modalities would restrict the discrimination of 3D model representation, resulting in inferior retrieval results. In this work, we propose a novel sequential learning (SL) framework for sketch-based 3D model retrieval to learn 3D model representation and 2D sketch representation separately and sequentially. Specifically, the SL framework is composed of two modules, 3D model network (3DMN) and 2D sketch network (2DSN). Firstly, we train 3DMN with a discriminative loss formulated only on 3D models to promote discrimination. Then, the learned representations of 3D models guide 2DSN to learn discriminative 2D sketch representations. In the second phase, we further mine the implicit fine-grained class information of 3D models by unsupervised clustering algorithms. An alignment loss is formulated on 2D sketches and corresponding fine-grained class centers of 3D models. Extensive experiments on three large-scale benchmark datasets for 3D model retrieval validate the efficacy of the proposed SL framework and fine-grained class representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Latecki, L.J.: GIFT: a real-time and scalable 3D shape search engine. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 5023–5032. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.543

  2. Bai, S., Bai, X., Zhou, Z., Zhang, Z., Tian, Q., Latecki, L.J.: GIFT: towards scalable 3D shape retrieval. IEEE Trans. Multimed. 19(6), 1257–1271 (2017)

    Article  Google Scholar 

  3. Banchs, R.E.: A comparative evaluation of 2D and 3D visual exploration of document search results. In: A. Jaafar, N.M. Ali, S.A.M. Noah, A.F. Smeaton, P. Bruza, Z.A. Bakar, N. Jamil, T.M.T. Sembok (eds.) Information Retrieval Technology—10th Asia Information Retrieval Societies Conference, AIRS 2014, Kuching, Malaysia, December 3–5, 2014. Proceedings, Lecture Notes in Computer Science, vol. 8870, pp. 100–111. Springer (2014). https://doi.org/10.1007/978-3-319-12844-3_9

  4. Chen, J., Fang, Y.: Deep cross-modality adaptation via semantics preserving adversarial learning for sketch-based 3D shape retrieval. In: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (eds.) Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIII, Lecture Notes in Computer Science, vol. 11217, pp. 624–640. Springer (2018). https://doi.org/10.1007/978-3-030-01261-8_37

  5. Dai, G., Xie, J., Fang, Y.: Deep correlated holistic metric learning for sketch-based 3D shape retrieval. IEEE Trans. Image Process. 27(7), 3374–3386 (2018). https://doi.org/10.1109/TIP.2018.2817042

    Article  MathSciNet  MATH  Google Scholar 

  6. Dai, G., Xie, J., Zhu, F., Fang, Y.: Deep correlated metric learning for sketch-based 3D shape retrieval. In: S.P. Singh, S. Markovitch (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, USA, pp. 4002–4008. AAAI Press (2017). http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14431

  7. Darom, T., Keller, Y.: Scale-invariant features for 3-D mesh models. IEEE Trans. Image Process. 21(5), 2758–2769 (2012)

    Article  MathSciNet  Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp. 248–255. IEEE Computer Society (2009). https://doi.org/10.1109/CVPR.2009.5206848

  9. Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. 31(4), 44:1-44:10 (2012)

    Google Scholar 

  10. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3D shape recognition. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, pp. 264–272. IEEE Computer Society (2018). http://openaccess.thecvf.com/content_cvpr_2018/html/Feng_GVCNN_Group-View_Convolutional_CVPR_2018_paper.html

  11. Furukawa, M., Akagi, Y., Kawai, Y., Kawasaki, H.: Interactive 3D animation creation and viewing system based on motion graph and pose estimation method. In: K.A. Hua, Y. Rui, R. Steinmetz, A. Hanjalic, A. Natsev, W. Zhu (eds.) Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03–07, 2014, pp. 1213–1216. ACM (2014). https://doi.org/10.1145/2647868.2655055

  12. Furuya, T., Ohbuchi, R.: Ranking on cross-domain manifold for sketch-based 3D model retrieval. In: X. Mao, L. Hong (eds.) 2013 International Conference on Cyberworlds, Yokohama, Japan, October 21–23, 2013, pp. 274–281. IEEE Computer Society (2013). https://doi.org/10.1109/CW.2013.60

  13. Furuya, T., Ohbuchi, R.: Deep aggregation of local 3D geometric features for 3D model retrieval. In: R.C. Wilson, E.R. Hancock, W.A.P. Smith (eds.) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19–22, 2016. BMVA Press (2016). http://www.bmva.org/bmvc/2016/papers/paper121/index.html

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90

  15. Hesamian, M.H., Jia, W., He, X., Kennedy, P.J.: Deep learning techniques for medical image segmentation: achievements and challenges. J. Digit. Imaging 32(4), 582–596 (2019)

    Article  Google Scholar 

  16. Kawamura, S., Usui, K., Furuya, T., Ohbuchi, R.: Local goemetrical feature with spatial context for shape-based 3D model retrieval. In: M. Spagnuolo, M.M. Bronstein, A.M. Bronstein, A. Ferreira (eds.) 5th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2012, Cagliari, Sardinia, Italy, May 13, 2012, pp. 55–58. Eurographics Association (2012). https://doi.org/10.2312/3DOR/3DOR12/055-058

  17. Klokov, R., Lempitsky, V.S.: Escape from cells: deep KD-networks for the recognition of 3D point cloud models. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 2017, pp. 863–872. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.99

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: P.L. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp. 1106–1114 (2012). https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html

  19. Kuang, Z., Yu, J., Zhu, S., Li, Z., Fan, J.: Effective 3-D shape retrieval by integrating traditional descriptors and pointwise convolution. IEEE Trans. Multimed. 21(12), 3164–3177 (2019)

    Article  Google Scholar 

  20. Lei, Y., Zhou, Z., Zhang, P., Guo, Y., Ma, Z., Liu, L.: Deep point-to-subspace metric learning for sketch-based 3D shape retrieval. Pattern Recognit. 96, 106981 (2019)

  21. Li, B., Lu, Y., Duan, F., Dong, S., Fan, Y., Qian, L., Laga, H., Li, H., Li, Y., Lui, P., Ovsjanikov, M., Tabia, H., Ye, Y., Yin, H., Xu, Z.: Shrec’16 track: 3D sketch-based 3D shape retrieval. In: Eurographics Workshop on 3D Object Retrieval (3DOR) (2016)

  22. Li, B., Lu, Y., Godil, A., Schreck, T., Aono, M., Johan, H., Saavedra, J.M., Tashiro, S.: Shrec’13 track: large scale sketch-based 3D shape retrieval. In: U. Castellani, T. Schreck, S. Biasotti, I. Pratikakis, A. Godil, R.C. Veltkamp (eds.) 6th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2013, Girona, Spain, May 11, 2013, pp. 89–96. Eurographics Association (2013). https://doi.org/10.2312/3DOR/3DOR13/089-096

  23. Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Burtscher, M., Fu, H., Furuya, T., Johan, H., Liu, J., Ohbuchi, R., Tatsuma, A., Zou, C.: Extended large scale sketch-based 3D shape retrieval. In: B. Bustos, H. Tabia, J. Vandeborre, R.C. Veltkamp (eds.) 7th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2014, Strasbourg, France, April 6, 2014, pp. 121–130. Eurographics Association (2014). https://doi.org/10.2312/3dor.20141058

  24. Li, Z., Xu, C., Leng, B.: Angular triplet-center loss for multi-view 3D shape retrieval. In: AAAI, pp. 8682–8689 (2019)

  25. Maturana, D., Scherer, S.A.: Voxnet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2015, Hamburg, Germany, September 28-October 2, 2015, pp. 922–928. IEEE (2015). https://doi.org/10.1109/IROS.2015.7353481

  26. Nie, W., Wang, K., Wang, H., Su, Y.: The assessment of 3D model representation for retrieval with CNN–RNN networks. Multimed. Tools Appl. 78(12), 16979–16994 (2019)

    Article  Google Scholar 

  27. de Oliveira Rente, P., Brites, C., Ascenso, J., Pereira, F.: Graph-based static 3D point clouds geometry coding. IEEE Trans. Multimed. 21(2), 284–299 (2019). https://doi.org/10.1109/TMM.2018.2859591

    Article  Google Scholar 

  28. Ouyang, W., Zeng, X., Wang, X.: Learning mutual visibility relationship for pedestrian detection with a deep model. Int. J. Comput. Vis. 120(1), 14–27 (2016). https://doi.org/10.1007/s11263-016-0890-9

    Article  MathSciNet  Google Scholar 

  29. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: an imperative style, high-performance deep learning library. In: H.M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E.B. Fox, R. Garnett (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp. 8024–8035 (2019). https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html

  30. Peed, E., Lee, N.: 3D printing, history of. In: Lee, N. (ed.) Encyclopedia of Computer Graphics and Games. Springer, Berlin (2019). https://doi.org/10.1007/978-3-319-08234-9_279-2

    Chapter  Google Scholar 

  31. Phong, B.T.: Illumination for computer generated pictures. Commun. ACM 18(6), 311–317 (1975)

    Article  Google Scholar 

  32. Qi, A., Song, Y., Xiang, T.: Semantic embedding for sketch-based 3D shape retrieval. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 43. BMVA Press (2018). http://bmvc2018.org/contents/papers/0040.pdf

  33. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 77–85. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.16

  34. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 5648–5656. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.609

  35. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. Fergus, S.V.N. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 5099–5108 (2017). https://proceedings.neurips.cc/paper/2017/hash/d8bf84be3800d12f74d8b05e9b89836f-Abstract.html

  36. Saavedra, J.M., Bustos, B., Schreck, T., Yoon, S.M., Scherer, M.: Sketch-based 3D model retrieval using keyshapes for global and local representation. In: M. Spagnuolo, M.M. Bronstein, A.M. Bronstein, A. Ferreira (eds.) 5th Eurographics Workshop on 3D Object Retrieval, 3DOR@Eurographics 2012, Cagliari, Sardinia, Italy, May 13, 2012, pp. 47–50. Eurographics Association (2012). https://doi.org/10.2312/3DOR/3DOR12/047-050

  37. Saravi, S., Joannou, D., Kalawsky, R., King, M.R.N., Marr, I.P., Hall, M., Wright, P.C.J., Ravindranath, R., Hill, A.: A systems engineering hackathon—a methodology involving multiple stakeholders to progress conceptual design of a complex engineered product. IEEE Access 6, 38399–38410 (2018)

    Article  Google Scholar 

  38. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 815–823. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298682

  39. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)

    Article  Google Scholar 

  40. Shilane, P., Min, P., Kazhdan, M.M., Funkhouser, T.A.: The Princeton shape benchmark. In: 2004 International Conference on Shape Modeling and Applications (SMI 2004), 7–9 June 2004, Genova, Italy, pp. 167–178. IEEE Computer Society (2004). https://doi.org/10.1109/SMI.2004.1314504

  41. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015, pp. 945–953. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.114

  42. Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, JMLR Workshop and Conference Proceedings, vol. 28, pp. 1139–1147. JMLR.org (2013). http://proceedings.mlr.press/v28/sutskever13.html

  43. Van Der Maaten, L., Hinton, G.E.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2625 (2008)

    MATH  Google Scholar 

  44. Wang, F., Kang, L., Li, Y.: Sketch-based 3D shape retrieval using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 1875–1883. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298797

  45. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T.S., Gong, Y.: Locality-constrained linear coding for image classification. In: The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010, pp. 3360–3367. IEEE Computer Society (2010). https://doi.org/10.1109/CVPR.2010.5540018

  46. Wang, P., Liu, Y., Guo, Y., Sun, C., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. 36(4), 72:1-72:11 (2017)

    Google Scholar 

  47. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 146:1-146:12 (2019). https://doi.org/10.1145/3326362

    Article  Google Scholar 

  48. Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: B. Leibe, J. Matas, N. Sebe, M. Welling (eds.) Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII, Lecture Notes in Computer Science, vol. 9911, pp. 499–515. Springer (2016). https://doi.org/10.1007/978-3-319-46478-7_31

  49. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp. 1912–1920. IEEE Computer Society (2015). https://doi.org/10.1109/CVPR.2015.7298801

  50. Xie, J., Dai, G., Fang, Y.: Deep multimetric learning for shape-based 3d model retrieval. IEEE Trans. Multimed. 19(11), 2463–2474 (2017)

    Article  Google Scholar 

  51. Xie, J., Dai, G., Zhu, F., Fang, Y.: Learning barycentric representations of 3D shapes for sketch-based 3D shape retrieval. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 3615–3623. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.385

  52. Xie, J., Dai, G., Zhu, F., Wong, E.K., Fang, Y.: Deepshape: deep-learned shape descriptor for 3D shape retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1335–1345 (2017)

    Article  Google Scholar 

  53. Yoon, S.M., Scherer, M., Schreck, T., Kuijper, A.: Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours. In: A.D. Bimbo, S. Chang, A.W.M. Smeulders (eds.) Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25–29, 2010, pp. 193–200. ACM (2010). https://doi.org/10.1145/1873951.1873961

  54. Zhu, F., Xie, J., Fang, Y.: Learning cross-domain neural networks for sketch-based 3D shape retrieval. In: D. Schuurmans, M.P. Wellman (eds.) Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12–17, 2016, Phoenix, Arizona, USA, pp. 3683–3689. AAAI Press (2016). http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11889

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhihui Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by B-K Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by National Natural Science Foundation of China (NSFC) under Grants No. 61772108, No. 61572096, and No. 61733002, and Dalian Science and Technology Innovation Fund with No. 2019J11CY004.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Tian, Y., Yang, C. et al. Sequential learning for sketch-based 3D model retrieval. Multimedia Systems 28, 761–778 (2022). https://doi.org/10.1007/s00530-021-00871-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00871-w

Keywords

Navigation