Abstract
An important task in image-based Multi-modal Knowledge Graph construction is grounding concepts to their corresponding images. However, existing research omits the intrinsic properties of different concepts. Specifically, there are some concepts that can not be characterized visually, such as mind, texture, session cookie and so on. In this work, we define concepts like these as non-visualizable concepts (NVC) and the others like dog that have clear and specific visual representations as visualizable concepts (VC). And, we propose a new task of distinguishing VCs from NVCs, which has rarely been tackled by the existing efforts. To address this problem, we propose a multi-modal classification model combining concept-related features from both texts and images. Due to the lack of enough training samples especially for NVC, we select concepts in ImageNet as the instances for VC, and also propose a webly-supervised method to get a small set of instances for NVC. Based on the small training set, we modify the basic two-step positive-unlabeled learning strategy to train the model. Extensive evaluations demonstrate that our model significantly outperforms a variety of baseline approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: Webly-supervised visual concept learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3270–3277 (2014)
Fellbaum, C.: Wordnet. The encyclopedia of applied linguistics (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)
Li, M., et al.: Gaia: a fine-grained multimedia knowledge extraction system. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 77–86 (2020)
Mitchell, T., Fredkin, E.: Never ending language learning. In: 2014 IEEE International Conference on Big Data (Big Data), p. 1 (2014)
Perona, P.: Vision of a visipedia. Proc. IEEE 98, 1526–1534 (2010)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)
Yang, K., Qinami, K., Fei-Fei, L., Deng, J., Russakovsky, O.: Towards fairer datasets: filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 547–558 (2020)
Acknowledgement
This work is supported by National Key Research and Development Project (No. 2020AAA0109302), Shanghai Science and Technology Innovation Action Plan (No. 19511120400), Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103) and National Natural Science Foundation of China (Grant No. 62072323).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, X. et al. (2022). Visualizable or Non-visualizable? Exploring the Visualizability of Concepts in Multi-modal Knowledge Graph. In: Bhattacharya, A., et al. Database Systems for Advanced Applications. DASFAA 2022. Lecture Notes in Computer Science, vol 13245. Springer, Cham. https://doi.org/10.1007/978-3-031-00123-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-00123-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00122-2
Online ISBN: 978-3-031-00123-9
eBook Packages: Computer ScienceComputer Science (R0)