skip to main content
10.1145/3488560.3498450acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Pretraining Multi-modal Representations for Chinese NER Task with Cross-Modality Attention

Published: 15 February 2022 Publication History

Abstract

Named Entity Recognition (NER) aims to identify the pre-defined entities from the unstructured text. Compared with English NER, Chinese NER faces more challenges: the ambiguity problem in entity boundary recognition due to unavailable explicit delimiters between Chinese characters, and the out-of-vocabulary (OOV) problem caused by rare Chinese characters. However, two important features specific to the Chinese language are ignored by previous studies: glyphs and phonetics, which contain rich semantic information of Chinese. To overcome these issues by exploiting the linguistic potential of Chinese as a logographic language, we present MPM-CNER (short for Multi-modal Pretraining Model for Chinese NER), a model for learning multi-modal representations of Chinese semantics, glyphs, and phonetics, via four pretraining tasks: Radical Consistency Identification (RCI), Glyph Image Classification (GIC), Phonetic Consistency Identification (PCI), and Phonetic Classification Modeling (PCM). Meanwhile, a novel cross-modality attention mechanism is proposed to fuse these multimodal features for further improvement. The experimental results show that our method outperforms the state-of-the-art baseline methods on four benchmark datasets, and the ablation study also verifies the effectiveness of the pre-trained multi-modal representations.

Supplementary Material

MP4 File (WSDM22-fp396_DOI_10_1145_3488560_3498450.mp4)
This video is a presentation of the paper ?Pretraining Multi-modal Representations for Chinese NER Task with Cross-Modality Attention? in WSDM2022. In this video, we introduce our method, a novel multi-modal pre-training model for Chinese NER, with the cross-modality attention mechanism, to fuse the Chinese semantics, glyphs, and phonetics. The experimental results verified that our method outperforms the previous SOTA baselines and proved the effectiveness of the multi-modal representations, which sheds light on exploiting the linguistic knowledge for Chinese NER.

References

[1]
Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao, and Shengping Liu. 2018. Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism. In EMNLP.
[2]
Yen-Chun Chen, Linjie Li, Licheng Yu, A. E. Kholy, Faisal Ahmed, Zhe Gan, Y. Cheng, and Jingjing Liu. 2020. UNITER: UNiversal Image-TExt Representation Learning. In ECCV.
[3]
Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, and Aniruddha Kembhavi. 2020. X-LXMERT: Paint, Caption and Answer Questions with Multi- Modal Transformers. In EMNLP.
[4]
Harm de Vries, Florian Strub, Jérémie Mary, H. Larochelle, O. Pietquin, and Aaron C. Courville. 2017. Modulating early visual processing by language. In NIPS.
[5]
J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.
[6]
Laura Dietz. 2019. ENT Rank: Retrieving Entities for Topical Information Needs through Entity-Neighbor-Text Relations. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019). https://doi.org/10.1145/3331184.3331257
[7]
Ruixue Ding, Pengjun Xie, Xiaoyan Zhang, Wei Lu, Linlin Li, and Luo Si. 2019. A Neural Multi-digraph Model for Chinese NER with Gazetteers. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1462--1467. https://doi.org/10.18653/v1/P19--1141
[8]
A. Graves and J. Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural networks : the official journal of the International Neural Network Society 18 5--6 (2005), 602--10.
[9]
Tao Gui, Ruotian Ma, Qi Zhang, Lujun Zhao, Yugang Jiang, and Xuanjing Huang. 2019. CNN-Based Chinese NER with Lexicon Rethinking. In IJCAI.
[10]
Bowen Hao, Jing Zhang, H. Yin, Cuiping Li, and Hong Chen. 2021. Pre-Training Graph Neural Networks for Cold-Start Users and Items Representation. Proceedings of the 14th ACM International Conference on Web Search and Data Mining (2021). https://doi.org/10.1145/3437963.3441738
[11]
S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9 (1997), 1735--1780.
[12]
Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. In ICML.
[13]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
[14]
J. Lafferty, A. McCallum, and Fernando Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In ICML.
[15]
Gen Li, Nan Duan, Yuejian Fang, Daxin Jiang, and M. Zhou. 2020. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training. In AAAI.
[16]
Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2020. A Unified MRC Framework for Named Entity Recognition. In ACL.
[17]
Xiaonan Li, Hang Yan, Xipeng Qiu, and Xuanjing Huang. 2020. FLAT: Chinese NER Using Flat-Lattice Transformer. In ACL.
[18]
Yan Li, Tingjian Ge, and Cindy Chen. 2020. Online Indices for Predictive Top-k Entity and Aggregate Queries on Knowledge Graphs. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 1057--1068.
[19]
Wei Liu, Xiyan Fu, Yue Zhang, and Wenming Xiao. 2021. Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter. In ACL/IJCNLP.
[20]
Wei Liu, Tongge Xu, QingHua Xu, Jiayu Song, and Yueran Zu. 2019. An Encoding Strategy Based Word-Character LSTM for Chinese NER. In NAACL.
[21]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021).
[22]
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In NeurIPS.
[23]
Xue Mengge, Bowen Yu, Tingwen Liu, Yue Zhang, Erli Meng, and BinWang. 2020. Porous Lattice Transformer Encoder for Chinese NER. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.340
[24]
Haiyun Peng, Yukun Ma, Soujanya Poria, Yang Li, and E. Cambria. 2021. Phoneticenriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning. Inf. Fusion 70 (2021), 88--99.
[25]
Minlong Peng, Ruotian Ma, Qi Zhang, and Xuanjing Huang. 2020. Simplify the Usage of Lexicon in Chinese NER. In ACL.
[26]
Nanyun Peng and Mark Dredze. 2015. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings. In EMNLP. https://doi.org/10. 18653/v1/d15--1064
[27]
Jonathan Shen, Ruoming Pang, Ron J. Weiss, M. Schuster, Navdeep Jaitly, Zongheng Yang, Z. Chen, Yu Zhang, Yuxuan Wang, R. Skerry-Ryan, R. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. 2018. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4779--4783.
[28]
Ying Shen, Desi Wen, Yaliang Li, Nan Du, Haitao Zheng, and Min Yang. 2019. Path-based Attribute-aware Representation Learning for Relation Prediction. In SDM.
[29]
C. Song and Arijit Sehanobish. 2020. Using Chinese Glyphs for Named Entity Recognition (Student Abstract). In AAAI.
[30]
Chen Sun, Austin Myers, Carl Vondrick, K. Murphy, and C. Schmid. 2019. VideoBERT: A Joint Model for Video and Language Representation Learning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 7463--7472.
[31]
Zijun Sun, Xiaoya Li, Xiaofei Sun, Yuxian Meng, Xiang Ao, Qing He, Fei Wu, and Jiwei Li. 2021. ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information. In ACL/IJCNLP.
[32]
Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5100--5111. https://doi.org/10. 18653/v1/D19--1514
[33]
Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. ArXiv abs/1706.03762.
[34]
Yifei Wang, S. Ananiadou, and Junichi Tsujii. 2019. Improving clinical named entity recognition in Chinese using the graphical and phonetic feature. BMC Medical Informatics and Decision Making 19 (2019).
[35]
Ralph Weischedel, Sameer Pradhan, Lance Ramshaw, Martha Palmer, Nianwen Xue, Mitchell Marcus, Ann Taylor, Craig Greenberg, Eduard Hovy, Robert Belvin, et al. 2011. Ontonotes release 4.0. LDC2011T03, Philadelphia, Penn.: Linguistic Data Consortium (2011).
[36]
Wei Wu, Yuxian Meng, F. Wang, Qinghong Han, Muyu Li, Xiaoya Li, J. Mei, Ping Nie, Xiaofei Sun, and Jiwei Li. 2019. Glyce: Glyph-vectors for Chinese Character Representations. In NeurIPS.
[37]
Wei Ye, B. Li, Rui Xie, Zhonghao Sheng, Long Chen, and Shikun Zhang. 2019. Exploiting Entity BIO Tag Embeddings and Multi-task Learning for Relation Extraction with Imbalanced Data. In ACL.
[38]
Jianfei Yu, Jing Jiang, Li Yang, and Rui Xia. 2020. Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer. In ACL.
[39]
Jianshu Zhang, Jun Du, and Lirong Dai. 2020. Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognit. 103 (2020), 107305.
[40]
Suxiang Zhang, Ying Qin, JuanWen, and XiaojieWang. 2006. Word Segmentation and Named Entity Recognition for SIGHAN Bakeoff3. In SIGHAN@COLING/ACL.
[41]
Yue Zhang and Jie Yang. 2018. Chinese NER Using Lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1554--1564. https://doi.org/10.18653/v1/P18--1144
[42]
Zhuosheng Zhang, Yafang Huang, and Hai Zhao. 2019. Open Vocabulary Learning for Neural Chinese Pinyin IME. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1584--1594. https://doi.org/10.18653/v1/P19--1154
[43]
Z. Zhang, Zhifei Li, Hai Liu, and Neal Xiong. 2020. Multi-scale Dynamic Convolutional Network for Knowledge Graph Embedding. IEEE Annals of the History of Computing (2020), 1--1.

Cited By

View all

Index Terms

  1. Pretraining Multi-modal Representations for Chinese NER Task with Cross-Modality Attention

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
    February 2022
    1690 pages
    ISBN:9781450391320
    DOI:10.1145/3488560
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. chinese named entity recognition
    2. cross-modality attention
    3. multi-modal representations
    4. pre-training model

    Qualifiers

    • Research-article

    Funding Sources

    • the National Natural Science Foundation of China
    • Jiangsu Province Science & Technology Research Grant
    • Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu, China
    • National Key R&D Program of China
    • National Natural Science Foundation of China

    Conference

    WSDM '22

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media