Elsevier

Neurocomputing

Volume 423, 29 January 2021, Pages 200-206
Neurocomputing

Hypergraph network model for nested entity mention recognition

https://doi.org/10.1016/j.neucom.2020.09.077Get rights and content

Highlights

  • We present a hypergraph network model for nested entity mention recognition.

  • We recognize nested entities by tagging hyperedges instead of nodes.

  • We propose a theorem to make hyperedges be easily denoted in the program.

  • We solve the data imbalance problem and reduce the computing cost.

Abstract

We propose a hypergraph network (HGN) model to recognize the nested entity mentions in texts. This model can learn the representations for the sequence structures of natural languages and the representations for the hypergraph structures of nested entity mentions. Mainstream methods recognize an entity mention by separately tagging the words or the gaps between words, which may complicate the problem and not be favorable for capturing the overall features of the mention. To solve these issues, the HGN model treats each entity mention as a whole and tags it with one label. We represent each sentence as a hypergraph, in which nodes represent words and hyperedges represent entity mentions. Thus, entity mention recognition (EMR) is transformed into a task of classifying the hyperedges. The HGN model firstly uses encoders to extract the features and learn a hypergraph representation, and then recognizes entity mentions by tagging every hyperedge. The experiments on three standard datasets demonstrate our model outperforms the previous models for nested EMR. We openly release the source code at https://github.com/nlplab-ie/HGN.

Introduction

An entity is an object or a set of objects in the world, and an entity mention is a reference to an entity in texts. Entity mentions may be indicated by nouns, noun phrases, or pronouns [1]. Entity mention recognition (EMR) is the task of identifying the references and classifying them into the predefined entity types such as person, location, organization, etc. Some named entity recognition (NER) tasks, such as the tasks on CoNLL2003 [2] are not including pronoun recognition. To distinguish our work from them, this paper uses EMR rather than NER. EMR is helpful for relation extraction, information retrieval, question answering and other natural language processing (NLP) tasks [3]. Many researches on EMR have achieved great success [4], [5]. However, most of them focus on normal EMR but fail to handle nested EMR [6]. Nested entity mentions are common in reality [6]. In two corpora from web logs, news and biomedical literature: 17% of the entity mentions are nested into another in the GENIA corpus and 35% of sentences contain nested entity mentions in the ACE2005 corpus. Therefore, nested EMR has been paid much attention and some effective models have been proposed to solve the nested EMR problem. These models include the linear-chain CRFs with the cascading technique [7], the tree-structure CRF parser [8], the CRF-based hypergraph model [9], [10], the CRF-based multigraph model [11], the LSTM-based hypergraph model [6] and the head-driven model [12]. These hypergraph-based methods are effective for modeling nested structures and have good performance on nested EMR. A hypergraph is a generalization of a graph in which a hyperedge can connect any number of nodes. As shown in Fig. 1, these two examples illustrate the hypergraph structures of nested entity mentions. Each sentence is represented as a mention hypergraph, in which nodes represent words and hyperedges represent entity mentions.

The work [9] introduced the notion of mention hypergraphs and recognized nested entity mentions by tagging the nodes of hypergraphs. On the contrary, we treat nested EMR as a problem of tagging hyperedges on hypergraphs in this paper. We propose a HGN model which can encode nodes and hyperedges to learn the representations of mention hypergraphs. Compared with the CRF-based methods, our model does not depend on a hand-crafted feature set. It can extract useful features automatically. An entity mention may contain one or more contiguous words, which we also call an entity mention chunk. Most models recognize entity mentions by separately tagging words or gaps between words, while our model considers an entity mention chunk as a whole and tags it with one label. There are three advantages to tagging whole entity mention chunks.

The first advantage is that the HGN model can avoid multi-label problems, those are the problems of tagging a word with multiple labels [13]. The multi-label problems in EMR could be divided into two kinds. We take the BIESO tagging scheme for example to explain them.

  • (i) The first kind is the problem of tagging the same words at different positions with different labels. For instance, in these sentences “Donald John Trump was born in …”, “Trump grew up in…”, and “Trump Tower located in New York”, the word “Trump” is tagged with E_PER (the end of a person) in the first sentence, S_PER (a single-word person entity) in the second sentence, and B_FAC (the beginning of a facility) in the third sentence. This problem may make the task more complex, although it can be solved by capturing the semantic information of contexts.

  • (ii) The second kind is the problem of tagging a word at a position with different labels. For instance, in the nested EMR task, the word “Trump” in “Trump Tower” is tagged with S_PER and B_FAC two labels. This could lead to the exponential number of possible label combinations. The second kind of the multi-label problems can be avoided by the cascading technique [7], however this technique could not handle nested entity mentions of the same type.

Our model tags each entity mention chunk instead of each word with one label, and so would not suffer from the two kinds of multi-label problems. “Donald John Trump”, “Trump” are tagged with PER, and “Trump Tower” is tagged with FAC in our model.

The second advantage is that our model does not need to add extra labels. In an EMR task, if there are person, location and organization three types of entities and one type of non-entities, then the number of types is N=3+1=4. If entity mentions are tagged with B, I, E, S four types of positional labels, then there are B_PER, I_PER …and a non-entity label N=3×4+1=13 combination labels in total. That is to say the BIESO tagging scheme makes a 4 -class classification problem become a 13 -class classification problem. In contrast, the HGN model does not add the position labels or the separator labels. A possible whole entity mention needs one label, such as PER, LOC, ORG, or NON. In other words, the HGN model does not increase the number of the classes, thus not increasing the complexity of the problem. Generally, for a classification task, the more the number of classes is, the more complex the problem becomes.

The third advantage is that our model could capture the overall features of entity mentions. HGN learns the representations of hypergraphs by combining two encoders in different layers. An encoder is used to learn the node representations and another to learn the hyperedge representations. HGN treats an entity mention sequence as a whole and tags the sequence with one label. It can capture the PER class’ features, the LOC class’ features and the like, while the method using the BIO/BIESO tagging scheme only can capture B_PER class’ features, I_PER class’ features and the like, so we argue that our method is favorable for capturing the overall features of entity mentions.

As stated above, the proposed model which identifies entity mentions by classifying hyperedges on hypergraphs has three advantages, however, there are two challenges in enumerating all hyperedges. The first is that the large number of hyperedges cause the great computing cost. The second is that the large number of non-entity samples cause the data imbalance problem. In this paper, we use a cost-sensitive loss function to overcome the data imbalance problem and the length restriction scheme to reduce the computing cost.

The contributions of this paper can be summarized as follows:

  • We present a hypergraph network (HGN) model for nested entity mention recognition. This model can learn node and hyperedge representations on the mention hypergraph. It is a general model for normal, crossing and nested EMR.

  • We propose a novel method which recognizes nested entity mentions by tagging hyperedges instead of nodes on hypergraphs. This method is capable of capturing the overall features of entities and avoiding the two kinds of multi-label problems.

  • We propose and prove a transformation theorem, according to which a mention hypergraph is transformed into a unique hypergraph with regular edges. It makes all the hyperedges be easily enumerated and denoted in the program.

  • We adopt a cost-sensitive loss function to overcome the data imbalance problem and the length restriction scheme to reduce the computing cost. The experiments demonstrate the effectiveness of the HGN model. It achieves obvious improvements over the previous models for nested EMR.

In the rest of the paper, we discuss the related work in Section 2, and then present our method in Section 3. Next, we perform experiments and analyze the results in Section 4. Finally, we conclude our work in Section 5.

Section snippets

Related work

Nested entity mention recognition has been paid much attention and some models have achieved great success in this task. The work [7] compared three different techniques applied to a linear-chain CRF model for nested entity mention recognition, those were layering, cascading, and joined label tagging. With a series of experiments, this work proved the cascading was the most effective way. The work [11] proposed a novel tagging scheme which assigned labels to the gaps between words instead of

Problem definition

A hypergraph H={X,E} is a generalization of a graph, whose edges can connect one or more nodes. X is the node set and E is the hyperedge set. A q-uniform hypergraph H(q) is a hypergraph such that all its hyperedges connect qnodes, so a 2 -uniform hypergraph is a graph. A sentence can be represented as an undirected hypergraph H called mention hypergraph, in which nodes represent words and hyperedges represent entity mentions. In order to take into account all possible entity mentions, we use a

Datasets

We conduct experiments on three standard datasets to evaluate the performance of our model. The ACE2005 corpus contains weblogs, broadcast news and newswire data in three languages. We choose the English part as a dataset. It defines 7entity types: PER (person), ORG (organization), GPE (geographical/social/political), LOC (location), FAC (facility), WEA (weapon) and VEH (vehicle). It has 9356 sentences and 30931 entity mentions. 3275 sentences contain nested entity mentions. For each entity

Conclusion and future work

This paper proposes a hypergraph network (HGN) model, which can learn the representations for the sequence structures of natural languages and the representations for the hypergraph structures of nested entities. Although two simple Bi-LSTMs are employed to implement the HGN model, it achieves a good performance on EMR. Besides the generality and effectiveness, the main advantages of the HGN model also include the flexibility. Because the node representation layer and the hyperedge

CRediT authorship contribution statement

Heyan Huang: Funding acquisition, Supervision, Resources, Project administration. Ming Lei: Investigation, Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing, Visualization, Validation. Chong Feng: Data curation, Formal analysis.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to thank the anonymous reviewers. This work is supported by the National Key Research and Development Program of China (No.2017YFB0803302), the National Key Research and Development Program of China (No.2016QY03D0602) and the National Natural Science Foundation of China (No. 61751201).

Heyan Huang received her Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 1989. She is currently a professor and the dean of the School of Computer Science and Technology, Beijing Institute of Technology, China. She is also the director of Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications. Her current research interests include information extraction, machine

References (22)

  • R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, S. Roukos, A statistical model for...
  • E.F.T.K. Sang et al.

    Introduction to the conll-2003 shared task: language-independent named entity recognition, in

  • G. Luo, X. Huang, C.-Y. Lin, Z. Nie, Joint entity recognition and disambiguation, in: Proceedings of the 2015...
  • J. Chiu et al.

    Named entity recognition with bidirectional lstm-cnns

    Trans. Assoc. Comput. Linguist.

    (2016)
  • J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language...
  • A. Katiyar, C. Cardie, Nested named entity recognition revisited, in: Proceedings of the 2018 Conference of the North...
  • B. Alex, B. Haddow, C. Grover, Recognising nested named entities in biomedical text, in: Biological, Translational, and...
  • J.R. Finkel, C.D. Manning, Nested named entity recognition, in: Proceedings of the 2009 Conference on Empirical Methods...
  • W. Lu, D. Roth, Joint mention extraction and classification with mention hypergraphs, in: Proceedings of the 2015...
  • A.O. Muis, W. Lu, Learning to recognize discontiguous entities, in: Proceedings of the 2016 Conference on Empirical...
  • A.O. Muis, W. Lu, Labeling gaps between words: recognizing overlapping mentions with mention separators, in:...
  • Cited by (0)

    Heyan Huang received her Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 1989. She is currently a professor and the dean of the School of Computer Science and Technology, Beijing Institute of Technology, China. She is also the director of Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications. Her current research interests include information extraction, machine learning, natural language processing, machine translation, and social network analysis. Various parts of her work have been published in top journals and forums including TKDE, AAAI, ACL, IJCAI and COLING.

    Ming Lei received his B.S. and M.S degree in computer science from Taiyuan University of Technology, China, in 2006 and 2009 respectively. He is currently a Ph.D. student in the School of Computer Science and Technology, Beijing Institute of Technology, China. His research interests include information extraction and natural language processing. His work has been published in NCAA.

    Chong Feng received his Ph.D. degree in computer science from University of Science and Technology of China in 2005. He is currently a associate professor of the School of Computer Science and Technology, Beijing Institute of Technology, China. His current research interests include information extraction, sentiment analysis, machine learning, natural language processing. Various parts of his work have been published in top journals and forums including AAAI, KBS, IJCAI, NAACL, Neurocomputing and COLING.

    View full text