Skip to main content

Multimodal Named Entity Recognition with Image Attributes and Image Knowledge

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12682))

Included in the following conference series:

Abstract

Multimodal named entity extraction is an emerging task which uses both textual and visual information to detect named entities and identify their entity types. The existing efforts are often flawed in two aspects. Firstly, they may easily ignore the natural prejudice of visual guidance brought by the image. Secondly, they do not further explore the knowledge contained in the image. In this paper, we novelly propose a novel neural network model which introduces both image attributes and image knowledge to help improve named entity extraction. While the image attributes are high-level abstract information of an image that could be labelled by a pre-trained model based on ImageNet, the image knowledge could be obtained from a general encyclopedia knowledge graph with multi-modal information such as DBPedia and Yago. Our emperical study conducted on real-world data collection demonstrates the effectiveness of our approach comparing with several state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at: https://keras.io/api/applications/#inceptionv3.

  2. 2.

    Available at: https://github.com/seatgeek/fuzzywuzzy.

  3. 3.

    Available at: https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip.

References

  1. Arshad, O., Gallo, I., Nawaz, S., Calefati, A.: Aiding intra-text representations with visual context for multimodal named entity recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 337–342. IEEE (2019)

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Semantic Web, International Semantic Web Conference, Asian Semantic Web Conference, ISWC + ASWC, Busan, Korea, November (2007)

    Google Scholar 

  3. Bianco, S., Cadene, R., Celona, L., Napoletano, P.: Benchmark analysis of representative deep neural network architectures. IEEE Access 6, 64270–64277 (2018)

    Article  Google Scholar 

  4. Cai, Y., Cai, H., Wan, X.: Multi-modal sarcasm detection in twitter with hierarchical fusion model. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2506–2515 (2019)

    Google Scholar 

  5. Chieu, H.L., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: COLING 2002: The 19th International Conference on Computational Linguistics (2002)

    Google Scholar 

  6. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(ARTICLE), 2493–2537 (2011)

    Google Scholar 

  7. Gu, Y., Yang, K., Fu, S., Chen, S., Li, X., Marsic, I.: Multimodal affective analysis using hierarchical attention strategy with word-level alignment. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2018)

    Google Scholar 

  8. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Computer Science. arXiv preprint arXiv:1508.01991 (2015)

  9. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  10. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)

  11. Limsopatham, N., Collier, N.: Bidirectional LSTM for named entity recognition in twitter messages (2016)

    Google Scholar 

  12. Lin, B.Y., Xu, F.F., Luo, Z., Zhu, K.: Multi-channel BiLSTM-CRF model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User-generated Text, pp. 160–165 (2017)

    Google Scholar 

  13. Liu, C., Zhu, C., Zhu, W.: Chinese named entity recognition based on BERT with whole word masking. In: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence, pp. 311–316 (2020)

    Google Scholar 

  14. Lu, D., Neves, L., Carvalho, V., Zhang, N., Ji, H.: Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1990–1999 (2018)

    Google Scholar 

  15. Luo, G., Huang, X., Lin, C.Y., Nie, Z.: Joint entity recognition and disambiguation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 879–888 (2015)

    Google Scholar 

  16. Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: OK-VQA: a visual question answering benchmark requiring external knowledge. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  17. Moon, S., Neves, L., Carvalho, V.: Multimodal named entity recognition for short social media posts. arXiv preprint arXiv:1802.07862 (2018)

  18. Peng, M., Ma, R., Zhang, Q., Huang, X.: Simplify the usage of lexicon in Chinese NER. arXiv preprint arXiv:1908.05969 (2019)

  19. Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534 (2011)

    Google Scholar 

  20. Ritter, A., Etzioni, O., Clark, S.: Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1104–1112 (2012)

    Google Scholar 

  21. Ritter, A., Wright, E., Casey, W., Mitchell, T.: Weakly supervised extraction of computer security events from Twitter. In: Proceedings of the 24th International Conference on World Wide Web, pp. 896–905 (2015)

    Google Scholar 

  22. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  23. Su, Z., Zhu, C., Dong, Y., Cai, D., Chen, Y., Li, J.: Learning visual knowledge memory networks for visual question answering. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  24. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706 (2007)

    Google Scholar 

  25. Wu, Q., Shen, C., Liu, L., Dick, A., Van Den Hengel, A.: What value do explicit high level concepts have in vision to language problems? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 203–212 (2016)

    Google Scholar 

  26. Yang, Z., Zheng, B., Li, G., Zhao, X., Zhou, X., Jensen, C.S.: Adaptive top-k overlap set similarity joins. In: ICDE, pp. 1081–1092. IEEE (2020)

    Google Scholar 

  27. Yu, J., Jiang, J., Yang, L., Xia, R.: Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. Association for Computational Linguistics (2020)

    Google Scholar 

  28. Gu, Y., Yang, K., Fu, S., Chen, S., Li, X.: Hybrid attention based multimodal network for spoken language classification. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting (2018)

    Google Scholar 

  29. Zhang, Q., Fu, J., Liu, X., Huang, X.: Adaptive co-attention network for named entity recognition in tweets. In: AAAI, pp. 5674–5681 (2018)

    Google Scholar 

  30. Zhang, X., Sun, X., Xie, C., Lun, B.: From vision to content: construction of domain-specific multi-modal knowledge graph. IEEE Access 7, 108278–108294 (2019)

    Google Scholar 

  31. Zheng, B., et al.: Online trichromatic pickup and delivery scheduling in spatial crowdsourcing. In: ICDE, pp. 973–984. IEEE (2020)

    Google Scholar 

  32. Zheng, B., Su, H., Hua, W., Zheng, K., Zhou, X., Li, G.: Efficient clue-based route search on road networks. TKDE 29(9), 1846–1859 (2017)

    Google Scholar 

  33. Zheng, B., Zhao, X., Weng, L., Hung, N.Q.V., Liu, H., Jensen, C.S.: PM-LSH: a fast and accurate LSH framework for high-dimensional approximate NN search. PVLDB 13(5), 643–655 (2020)

    Google Scholar 

  34. Zheng, B., et al.: Answering why-not group spatial keyword queries. TKDE 32(1), 26–39 (2020)

    Google Scholar 

Download references

Acknowledgment

This research is partially supported by National Key R&D Program of China (No. 2018AAA0101900), the Priority Academic Program Development of Jiangsu Higher Education Institutions, National Natural Science Foundation of China (Grant No. 62072323, 61632016), Natural Science Foundation of Jiangsu Province (No. BK20191420), and the Suda-Toycloud Data Intelligence Joint Laboratory.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, D., Li, Z., Gu, B., Chen, Z. (2021). Multimodal Named Entity Recognition with Image Attributes and Image Knowledge. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12682. Springer, Cham. https://doi.org/10.1007/978-3-030-73197-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73197-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73196-0

  • Online ISBN: 978-3-030-73197-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics