skip to main content
research-article

Shape2Vec: semantic-based descriptors for 3D shapes, sketches and images

Published: 05 December 2016 Publication History

Abstract

Convolutional neural networks have been successfully used to compute shape descriptors, or jointly embed shapes and sketches in a common vector space. We propose a novel approach that leverages both labeled 3D shapes and semantic information contained in the labels, to generate semantically-meaningful shape descriptors. A neural network is trained to generate shape descriptors that lie close to a vector representation of the shape class, given a vector space of words. This method is easily extendable to range scans, hand-drawn sketches and images. This makes cross-modal retrieval possible, without a need to design different methods depending on the query type. We show that sketch-based shape retrieval using semantic-based descriptors outperforms the state-of-the-art by large margins, and mesh-based retrieval generates results of higher relevance to the query, than current deep shape descriptors.

Supplementary Material

ZIP File (a208-tasse.zip)
Supplemental file.

References

[1]
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X., 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.
[2]
Bai, S., Bai, X., Zhou, Z., Zhang, Z., and Latecki, L. J. 2016. Gift: A real-time and scalable 3d shape search engine. In CVPR 2016. To appear.
[3]
Biasotti, S., Cerri, A., Bronstein, A., and Bronstein, M. 2015. Recent trends, applications, and perspectives in 3d shape similarity assessment. Computer Graphics Forum.
[4]
Boscaini, D., Masci, J., Rodolà, E., Bronstein, M. M., and Cremers, D. 2016. Anisotropic diffusion descriptors. In Eurographics 2016.
[5]
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., and Yu, F. 2015. ShapeNet: An information-rich 3D model repository. In arXiv.
[6]
Choi, S., Zhou, Q.-Y., Miller, S., and Koltun, V. 2016. A large dataset of object scans. arXiv:1602.02481.
[7]
Duchi, J., Hazan, E., and Singer, Y. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12 (July), 2121--2159.
[8]
Eitz, M., Hays, J., and Alexa, M. 2012. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4, 44:1--44:10.
[9]
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.
[10]
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., and Mikolov, T. 2013. DeViSE: A deep visual-semantic embedding model. In NIPS'13, 2121--2129.
[11]
Gong, B., Liu, J., Wang, X., and Tang, X. 2013. Learning semantic signatures for 3d object retrieval. Trans. Multi. 15, 2 (Feb.), 369--377.
[12]
Hardoon, D. R., Szedmak, S. R., and Shawe-taylor, J. R. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16, 12, 2639--2664.
[13]
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
[14]
Karpathy, A., 2015. "CS231n: Convolutional Neural Networks for Visual Recognition". http://cs231n.github.io/.
[15]
Kazhdan, M., Funkhouser, T., and Rusinkiewicz, S. 2003. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Symposium on Geometry Processing.
[16]
Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 1097--1105.
[17]
Kruskal, J. B. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1, 1--27.
[18]
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Chen, Q., Chowdhury, N., Fang, B., Furuya, T., Johan, H., Kosaka, R., Koyanagi, H., Ohbuchi, R., and Tatsuma, A. 2014. SHREC'14 track: Large Scale Comprehensive 3d shape retrieval. In Proc. EG Workshop on 3D Object Retrieval.
[19]
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Fu, H., Furuya, T., Johan, H., Liu, J., Ohbuchi, R., Tatsuma, A., and Zou, C. 2014. SHREC'14 track: Extended large scale sketch-based 3d shape retrieval. In Proc. EG Workshop on 3D Object Retrieval, 2014.
[20]
Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Burtscher, M., Chen, Q., Chowdhury, N. K., Fang, B., Fu, H., Furuya, T., Li, H., Liu, J., Johan, H., Kosaka, R., Koyanagi, H., Ohbuchi, R., Tatsuma, A., Wan, Y., Zhang, C., and Zou, C. 2015. A comparison of 3d shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding 131, 1 -- 27.
[21]
Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., and Guibas, L. J. 2015. Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph. 34, 6 (Oct.), 234:1--234:12.
[22]
Masci, J., Boscaini, D., Bronstein, M. M., and Vandergheynst, P. 2015. Geodesic convolutional neural networks on riemannian manifolds. In The IEEE International Conference on Computer Vision (ICCV) Workshops.
[23]
Mikolov, T., Chen, K., Corrado, G., and Dean, J. 2013. Efficient estimation of word representations in vector space. ICLR Workshop.
[24]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211--252.
[25]
Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., Bai, X., Fish, N., Han, J., Kalogerakis, E., Learned-Miller, E. G., Li, Y., Liao, M., Maji, S., Wang, Y., Zhang, N., and Zhou, Z. 2016. Large-Scale 3D Shape Retrieval from ShapeNet Core55. In Proc. EG Workshop on 3D Object Retrieval.
[26]
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1929--1958.
[27]
Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proc. ICCV.
[28]
van der Maaten, L. 2009. Learning a parametric embedding by preserving local structure. In Proc. of AISTATS, vol. 5, 384--391.
[29]
Wang, A., Lu, J., Cai, J., Cham, T. J., and Wang, G. 2015. Large-margin multi-modal deep learning for rgb-d object recognition. IEEE Transactions on Multimedia 17, 11, 1887--1898.
[30]
Wang, F., Kang, L., and Li, Y. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. In CVPR 2015.
[31]
Wu, Z., and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics, ACL '94, 133--138.
[32]
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 2015. 3d shapenets: A deep representation for volumetric shapes. In CVPR 2015, 1912--1920.
[33]
Yu, Q., Yang, Y., Song, Y., Xiang, T., and Hospedales, T. 2015. Sketch-a-net that beats humans. In BMVC15, 7.

Cited By

View all
  • (2024)Do Similar Objects Have Similar Grasp Positions?Sensors10.3390/s2423773524:23(7735)Online publication date: 3-Dec-2024
  • (2024)Sketch-Based 3D Shape Retrieval With Multi-View Fusion TransformerICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446103(3005-3009)Online publication date: 14-Apr-2024
  • (2024)D2GL: Dual-level dual-scale graph learning for sketch-based 3D shape retrievalPattern Recognition10.1016/j.patcog.2024.110768156(110768)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 35, Issue 6
November 2016
1045 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2980179
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2016
Published in TOG Volume 35, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 2D sketch
  2. CNN
  3. deep learning
  4. depthmap
  5. semantic-based
  6. shape descriptor
  7. word vector space

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)5
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Do Similar Objects Have Similar Grasp Positions?Sensors10.3390/s2423773524:23(7735)Online publication date: 3-Dec-2024
  • (2024)Sketch-Based 3D Shape Retrieval With Multi-View Fusion TransformerICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446103(3005-3009)Online publication date: 14-Apr-2024
  • (2024)D2GL: Dual-level dual-scale graph learning for sketch-based 3D shape retrievalPattern Recognition10.1016/j.patcog.2024.110768156(110768)Online publication date: Dec-2024
  • (2024)SKD-SBSR: Structural Knowledge Distillation for Sketch-Based 3D Shape RetrievalKnowledge-Based Systems10.1016/j.knosys.2024.112891(112891)Online publication date: Dec-2024
  • (2023)Survey of lightweighting methods of huge 3D models for online Web3D visualizationVirtual Reality & Intelligent Hardware10.1016/j.vrih.2020.02.0025:5(395-406)Online publication date: Oct-2023
  • (2023) HDA L: Hierarchical Domain-Augmented Adaptive Learning for sketch-based 3D shape retrieval Knowledge-Based Systems10.1016/j.knosys.2023.110302264(110302)Online publication date: Mar-2023
  • (2023)PAGML: Precise Alignment Guided Metric Learning for sketch-based 3D shape retrievalImage and Vision Computing10.1016/j.imavis.2023.104756136(104756)Online publication date: Aug-2023
  • (2021)SketchZooms: Deep Multi‐view Descriptors for Matching Line DrawingsComputer Graphics Forum10.1111/cgf.1419740:1(410-423)Online publication date: 20-Jan-2021
  • (2021)Sketch Augmentation-Driven Shape Retrieval Learning Framework Based on Convolutional Neural NetworksIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.297550427:8(3558-3570)Online publication date: 30-Jun-2021
  • (2021)Active Arrangement of Small Objects in 3D Indoor ScenesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2019.294929527:4(2250-2264)Online publication date: 1-Apr-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media