Visualizable or Non-visualizable? Exploring the Visualizability of Concepts in Multi-modal Knowledge Graph

Jiang, Xueyao; Li, Ailisi; Liang, Jiaqing; Liu, Bang; Xie, Rui; Wu, Wei; Li, Zhixu; Xiao, Yanghua

doi:10.1007/978-3-031-00123-9_14

Xueyao Jiang¹⁶,
Ailisi Li¹⁶,
Jiaqing Liang¹⁶,
Bang Liu¹⁷,
Rui Xie¹⁸,
Wei Wu¹⁸,
Zhixu Li¹⁶ &
…
Yanghua Xiao^16,19

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13245))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3093 Accesses
1 Citations

Abstract

An important task in image-based Multi-modal Knowledge Graph construction is grounding concepts to their corresponding images. However, existing research omits the intrinsic properties of different concepts. Specifically, there are some concepts that can not be characterized visually, such as mind, texture, session cookie and so on. In this work, we define concepts like these as non-visualizable concepts (NVC) and the others like dog that have clear and specific visual representations as visualizable concepts (VC). And, we propose a new task of distinguishing VCs from NVCs, which has rarely been tackled by the existing efforts. To address this problem, we propose a multi-modal classification model combining concept-related features from both texts and images. Due to the lack of enough training samples especially for NVC, we select concepts in ImageNet as the instances for VC, and also propose a webly-supervised method to get a small set of instances for NVC. Based on the small training set, we modify the basic two-step positive-unlabeled learning strategy to train the model. Extensive evaluations demonstrate that our model significantly outperforms a variety of baseline approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://image-net.org/challenges/LSVRC/2012/browse-synsets.

References

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: Webly-supervised visual concept learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3270–3277 (2014)
Google Scholar
Fellbaum, C.: Wordnet. The encyclopedia of applied linguistics (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Li, M., et al.: Gaia: a fine-grained multimedia knowledge extraction system. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 77–86 (2020)
Google Scholar
Mitchell, T., Fredkin, E.: Never ending language learning. In: 2014 IEEE International Conference on Big Data (Big Data), p. 1 (2014)
Google Scholar
Perona, P.: Vision of a visipedia. Proc. IEEE 98, 1526–1534 (2010)
Article Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Article Google Scholar
Yang, K., Qinami, K., Fei-Fei, L., Deng, J., Russakovsky, O.: Towards fairer datasets: filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 547–558 (2020)
Google Scholar

Download references

Acknowledgement

This work is supported by National Key Research and Development Project (No. 2020AAA0109302), Shanghai Science and Technology Innovation Action Plan (No. 19511120400), Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103) and National Natural Science Foundation of China (Grant No. 62072323).

Author information

Authors and Affiliations

Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
Xueyao Jiang, Ailisi Li, Jiaqing Liang, Zhixu Li & Yanghua Xiao
Mila and DIRO, Université de Montréal, Montréal, Québec, Canada
Bang Liu
Meituan, Shanghai, China
Rui Xie & Wei Wu
Fudan-Aishu Cognitive Intelligence Joint Research Center, Shanghai, China
Yanghua Xiao

Authors

Xueyao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ailisi Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqing Liang
View author publications
You can also search for this author in PubMed Google Scholar
Bang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Xie
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhixu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanghua Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhixu Li or Yanghua Xiao .

Editor information

Editors and Affiliations

Indian Institute of Technology Kanpur, Kanpur, India
Arnab Bhattacharya
National University of Singapore, Singapore, Singapore
Janice Lee Mong Li
University of California, Santa Barbara, Santa Barbara, CA, USA
Divyakant Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Mukesh Mohania
Ashoka University, Sonepat, Haryana, India
Anirban Mondal
Indraprastha Institute of Information Technology Delhi, New Delhi, India
Vikram Goyal
University of Aizu, Aizu, Japan
Rage Uday Kiran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, X. et al. (2022). Visualizable or Non-visualizable? Exploring the Visualizability of Concepts in Multi-modal Knowledge Graph. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13245. Springer, Cham. https://doi.org/10.1007/978-3-031-00123-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-00123-9_14
Published: 08 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00122-2
Online ISBN: 978-3-031-00123-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics