Skip to main content
Log in

UAMNer: uncertainty-aware multimodal named entity recognition in social media posts

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Named Entity Recognition (NER) on social media is a challenging task, as social media posts are usually short and noisy. Recently, some work explores different ways to incorporate the visual information from the image to improve NER on social media and achieves great success. However, existing methods ignore a common scenario on social media—the image sometimes does not match the posted text. Thus, the irrelevant images may introduce noisy information in existing models. In this paper, a novel uncertainty-aware framework for multimodal NER (UAMNer) on social media is put forward, which combines visual features with text when the text information is insufficient, thus suppressing noisy information from the irrelevant images. Specifically, we propose a two-stage label refinement framework for multimodal NER in social media posts. Given a multimodal post, we first use a bayesian neural network to produce candidate labels from the text. If the candidate labels have high uncertainty, we then use a multimodal transformer to refine the label with textual and visual features. We experiment on two public datasets, namely Twitter-2015 and Twitter-2017. The proposed method achieves better performance compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip

  2. https://download.pytorch.org/models/resnet152-b121ed2d.pth

References

  1. Yadav V, Bethard S (2018) A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th international conference on computational linguistics. Association for Computational Linguistics, Santa Fe New Mexico USA, pp 2145–2158

  2. Li M, Zareian A, Zeng Q, Whitehead S, Lu D, Ji H, Chang SF (2020) Cross-media structured common space for multimedia event extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 2557–2568

  3. Xue Z, Li G, Zhang W, Pang J, Huang Q (2014) Topic detection in cross-media: a semi-supervised co-clustering approach. Int J Multimed Inf Retrieval 3(3):193–205

    Article  Google Scholar 

  4. Li C, Weng J, He Q, Yao Y, Datta A, Sun A, Lee BS (2012) Twiner: Named entity recognition in targeted twitter stream. In: Proceedings of the 35th International ACM SIGIR conference on research and development in information retrieval, SIGIR ’12. Association for Computing Machinery, New York, NY, USA, pp 721–730

  5. Limsopatham N, Collier N (2016) Bidirectional LSTM for named entity recognition in Twitter messages. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). The COLING Organizing Committee, Osaka, Japan, pp 145–152

  6. Li C, Sun A, Weng J, He Q (2015) Tweet segmentation and its application to named entity recognition. IEEE Trans Knowl Data Eng 27(2):558–570

    Article  Google Scholar 

  7. Ritter A, Clark SM, Etzioni O (2011) Named entity recognition in tweets: An experimental study. In: Proceedings of the 2011 conference on empirical methods in natural language processing. Association for Computational Linguistics, Edinburgh, Scotland, UK, pp 1524–1534

  8. Moon S, Neves L, Carvalho V (2018) Multimodal named entity disambiguation for noisy social media posts. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne Australia, pp 2000–2008

  9. Zhang Q, Fu J, Liu X, Huang X (2018) Adaptive co-attention network for named entity recognition in tweets. In: AAAI, pp 5674–5681

  10. Arshad O, Gallo I, Nawaz S, Calefati A (2019) Aiding intra-text representations with visual context for multimodal named entity recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 337–342

  11. Lu D, Neves L, Carvalho V, Zhang N, Ji H (2018) Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne Australia, pp 1990–1999

  12. Yu J, Jiang J, Yang L, Xia R (2020) Improving multimodal named entity recognition via entity span detection with unified multimodal transformer. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 3342–3352

  13. Wu Z, Zheng C, Cai Y, Chen J, Leung HF, Li Q (2020) Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts. pp 1038–1046, Association for Computing Machinery, New York, NY, USA

  14. Zheng C, Wu Z, Wang T, Yi C, Li Q (2020) Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Trans Multimed PP(99):1–1

    Google Scholar 

  15. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17. Curran Associates Inc, Red Hook, NY, USA, pp 6000–6010

  16. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610. IJCNN 2005

    Article  Google Scholar 

  17. Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th international conf. on machine learning, Morgan Kaufmann, San Francisco, CA, pp 282–289

  18. Xiong W, Yu M, Chang S, Guo X, Wang WY (2019) Improving question answering over incomplete KBs with knowledge-aware reader. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp 4258–4264

  19. Sun F, Jiang P, Sun H, Pei C, Ou W, Wang X (2018) Multi-source pointer network for product title summarization. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM ’18. Association for Computing Machinery, New York, NY, USA, pp 7–16

  20. Wang H, Zhang F, Zhao M, Li W, Xie X, Guo M (2019) Multi-task feature learning for knowledge graph enhanced recommendation. In: The World Wide Web conference, WWW ’19. Association for Computing Machinery, New York, NY, USA, pp 2000–2010

  21. Bender O, Och FJ, Ney H (2003) Maximum entropy models for named entity recognition. In: Proceedings of the seventh conference on natural language learning at HLT-NAACL 2003, pp 148–151

  22. Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05). Association for Computational Linguistics, Ann Arbor, Michigan, pp 363–370

  23. Passos A, Kumar V, McCallum A (2014) Lexicon infused phrase embeddings for named entity resolution. In: Proceedings of the eighteenth conference on computational natural language learning. Association for Computational Linguistics, Ann Arbor, Michigan, pp 78–86

  24. Chiu JPC, Nichols E (2016) Named entity recognition with bidirectional lstm-cnns. Trans Assoc Comput Linguist 4(1):357–370

    Article  Google Scholar 

  25. Zhao Z, Yang Z, Luo L, Wang L, Zhang Y, Lin H, Wang J (2017) Disease named entity recognition from biomedical literature using a novel convolutional neural network. BMC Med Genomics 10(5):73

    Article  Google Scholar 

  26. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp 1064–1074

  27. Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B (July 2017) Joint extraction of entities and relations based on a novel tagging scheme. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, pp 1227–1236

  28. Gui T, Ye J, Zhang Q, Zhou Y, Gong Y, Huang X (2020) Leveraging document-level label consistency for named entity recognition. In: Bessiere C (ed) Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20. International joint conferences on artificial intelligence organization. Main track, pp 3976–3982

  29. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis Minnesota, pp 4171–4186

  30. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems, vol 32. Curran Associates Inc.

  31. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 2227–2237

  32. Moon S, Neves L, Carvalho V (2018) Multimodal named entity recognition for short social media posts. In: New Orleans Louisiana. Association for Computational Linguistics, pp 852–860

  33. Meysam Asgari-Chenaghlu M, Feizi-Derakhshi R, Farzinvash L, Balafar MA, Motamed C (2020) A multimodal deep learning approach for named entity recognition from social media

  34. Chen S, Aguilar G, Neves L, Solorio T (2020) A caption is worth a thousand images: Investigating image captions for multimodal named entity recognition

  35. Ioannidou A, Chatzilari E, Nikolopoulos S, Kompatsiaris I (2017) Deep learning advances in computer vision with 3d data A survey. ACM Comput Surv 50(2):20

    Google Scholar 

  36. Al-Ayyoub M, Nuseir A, Alsmearat K, Jararweh Ya, Gupta B (2018) Deep learning for arabic nlp A survey. J Comput Sci 26:522–531

    Article  Google Scholar 

  37. Abdelhamid O, Mohamed A, Jiang H, Deng L, Penn G, Yu D (2014) Convolutional neural networks for speech recognition. IEEE Trans Audio Speech Lang Process 22(10):1533–1545

    Article  Google Scholar 

  38. Hubmann C, Becker M, Althoff D, Lenz D, Stiller C (2017) Decision making for autonomous driving considering interaction and uncertain prediction of surrounding vehicles. In: 2017 IEEE intelligent vehicles symposium (IV), pp 1671–1678

  39. Guo C, Pleiss G, Yu S, Weinberger KQ (2017) On calibration of modern neural networks. In: Proceedings of the 34th international conference on machine learning - Volume 70, ICML’17. JMLR.org, pp 1321–1330

  40. Loquercio, Segu M, Scaramuzza D (2020) A general framework for uncertainty estimation in deep learning. IEEE Robot Autom Lett 5(2):3153–3160

    Article  Google Scholar 

  41. Goan E, Fookes C (2020) Bayesian neural networks: An introduction and survey. In: Mengersen KL, Pudlo P, Robert CP (eds) Case studies in applied bayesian data science: CIRM Jean-Morlet Chair, Fall 2018, Lecture Notes in Mathematics. Springer, Cham, Switzerland, pp 45–87

  42. Graves A (2011) Practical variational inference for neural networks. In: Proceedings of the 24th international conference on neural information processing systems, NIPS’11. Curran Associates Inc, Red Hook, NY USA, pp 2348–2356

  43. Gal Y, Ghahramani Z (2016) Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of the 33rd international conference on international conference on machine learning - Volume 48, ICML’16. JMLR.org, pp 1050–1059

  44. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958

    MathSciNet  MATH  Google Scholar 

  45. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  46. Ba JL, Kiros R, Hinton GE (2016) Layer normalization. CoRR

  47. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the 2016 conference of the north american chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, San Diego, California, pp 260–270

  48. Kingma DP, Ba LJ (2015) Adam: A method for stochastic optimization international conference on learning representations

  49. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Killeen GT, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems, pp 8026– 8037

  50. Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. arXiv:1508.01991

  51. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532– 1543

  52. Ma X, Hovy E (2016) End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp 1064–1074

  53. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings of the 2016 conference of the north american chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, San Diego, California, pp 260–270

  54. Dai Z, Wang X, Ni P, Li Y, Bai X (2019) Named entity recognition using bert bilstm crf for chinese electronic health records. In: 2019 12th international congress on image and signal processing biomedical engineering and informatics (CISP-BMEI)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 61871278 and U1836118, the Sichuan Science and Technology Program (no. 2018HH0143).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohai He.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, L., Wang, M., Zhang, M. et al. UAMNer: uncertainty-aware multimodal named entity recognition in social media posts. Appl Intell 52, 4109–4125 (2022). https://doi.org/10.1007/s10489-021-02546-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02546-5

Keywords

Navigation