Skip to main content
Log in

On development of multimodal named entity recognition using part-of-speech and mixture of experts

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Multimodal Named Entity Recognition (MNER) is a fundamental task in the field of natural language processing for social media posts. Current MNER models fail to deal with the relation between text and image entities, which results in the textual noise, image noise and even multimodal noise during processing. In this paper, we first introduce the Part-of-speech (POS) information, which is used for non-entity words eliminating and textual noise filtering. A POS-base gated cross-modal attention network is established to precisely learn the textual and visual representations to remove the image noise. Then, a Mixture-of-Experts (MOE) is proposed for multimodality integration, which optimize the effectiveness of named entity identification and filter the multimodal noise. We evaluate the proposed model on the Twitter dataset and the experimental results establish a strong evidence of the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Suman C, Reddy SM, Saha S, Bhattacharyya P (2021) Why pay more? a simple and efficient named entity recognition system for tweets. Expert Syst Appl 167:114101

    Article  Google Scholar 

  2. Hogan M, Strasburger VC (2018) Social media and new technology: a primer. Clin Pediatr 57(10):1204–1215

    Article  Google Scholar 

  3. Pierri F, Piccardi C, Ceri S (2020) A multi-layer approach to disinformation detection in us and italian news spreading on twitter. EPJ Data Sci 9(1):35

    Article  Google Scholar 

  4. Lizhen L, Wei S, Hanshi W, Chuchu L, Jingli L (2014) A novel feature-based method for sentiment analysis of Chinese product reviews. China Commun 11(3):154–164

    Article  Google Scholar 

  5. Bruns A, Liang YE (2012) Tools and methods for capturing twitter data during natural disasters. First Monday 17(4):1–8

    Google Scholar 

  6. Zhang Q, Fu J, Liu X, Huang X (2018) Adaptive co-attention network for named entity recognition in tweets. In: Thirty-Second AAAI Conference on Artificial Intelligence, 2018

  7. Yu J, Jiang J, Yang L, Xia R (2020) Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 3342–3352

  8. Corchs S, Fersini E, Gasparini F (2019) Ensemble learning on visual and textual data for social image emotion classification. Int J Mach Learn Cybern 10(8):2057–2070

    Article  Google Scholar 

  9. Kim G, Lee C, Jo J, Lim H (2020) Automatic extraction of named entities of cyber threats using a deep bi-lstm-crf network. Int J Mach Learn Cybern 11(10):2341–2355

    Article  Google Scholar 

  10. Wang D, Fan X (2009) Named entity recognition for short text. J Comput Appl 29(1):143–145

    MATH  Google Scholar 

  11. Ruokolainen T, Kauppinen P, Silfverberg M, Linden K (2020) A finnish news corpus for named entity recognition. Lang Resour Eval 54(1):247–272

    Article  Google Scholar 

  12. Zhou L, Li J, Gu Z, Qiu J, Gupta BB, Tian Z Panner: Pos-aware nested named entity recognition through heterogeneous graph neural network. In: IEEE Transactions on Computational Social Systems.

  13. Gangadharan V, Gupta D (2020) Recognizing named entities in agriculture documents using lda based topic modelling techniques. Procedia Comput Sci 171:1337–1345

    Article  Google Scholar 

  14. Baltruˇsaitis T, Ahuja C, Morency L-P (2018) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443

    Article  Google Scholar 

  15. Zeng C, Kwong S (2022) Learning cross-modality features for image caption generation. Int J Mach Learn Cybern 13(7):2059–2070

    Article  Google Scholar 

  16. Bruni E, Tran N-K, Baroni M (2014) Multimodal distributional semantics. J Artif Intell Res 49:1–47

    Article  MathSciNet  MATH  Google Scholar 

  17. Lu D, Neves L, Carvalho V, Zhang N, Ji H (2018) Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 1990–1999

  18. Zheng C, Wu Z, Wang T, Cai Y, Li Q (2020) Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Trans Multimed 23:2520–2532

    Article  Google Scholar 

  19. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1)

  20. Qin Y, Shen G-W, Zhao W-B, Chen Y-P, Yu M, Jin X (2019) A network security entity recognition method based on feature template and cnn-bilstm-crf. Front Inf Technol Electron Eng 20(6):872–884

    Article  Google Scholar 

  21. Yang J, Liang S, Zhang Y (2018) Design challenges and misconceptions in neural sequence labeling. In: Proceedings of the 27th international conference on computational linguistics, pp 3879–3889

  22. Chen D, Li Z, Gu B, Chen Z (2021) Multimodal named entity recognition with image attributes and image knowledge. In: International conference on database systems for advanced applications. Springer, pp186–201

  23. Liu L, Wang M, Zhang M, Qing L, He X (2022) Uamner: uncertainty-aware multimodal named entity recognition in social media posts. Appl Intell 52(4):4109–4125

    Article  Google Scholar 

Download references

Funding

This work was supported by the National Statistical Science Research Project of China under Grant No. 2016LY98, the Characteristic Innovation Projects of Guangdong Colleges and Universities (Nos. 2018KTSCX049), the Science and Technology Plan Project of Guangzhou under Grant Nos. 202102080258 and 201903010013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xue.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Xue, Y., Zhang, H. et al. On development of multimodal named entity recognition using part-of-speech and mixture of experts. Int. J. Mach. Learn. & Cyber. 14, 2181–2192 (2023). https://doi.org/10.1007/s13042-022-01754-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01754-w

Keywords

Navigation