Reference Hub1
Multi-Distribution Characteristics Based Chinese Entity Synonym Extraction from The Web

Multi-Distribution Characteristics Based Chinese Entity Synonym Extraction from The Web

Xiuxia Ma, Xiangfeng Luo, Subin Huang, Yike Guo
Copyright: © 2019 |Volume: 15 |Issue: 3 |Pages: 22
ISSN: 1548-3657|EISSN: 1548-3665|EISBN13: 9781522564348|DOI: 10.4018/IJIIT.2019070103
Cite Article Cite Article

MLA

Ma, Xiuxia, et al. "Multi-Distribution Characteristics Based Chinese Entity Synonym Extraction from The Web." IJIIT vol.15, no.3 2019: pp.42-63. http://doi.org/10.4018/IJIIT.2019070103

APA

Ma, X., Luo, X., Huang, S., & Guo, Y. (2019). Multi-Distribution Characteristics Based Chinese Entity Synonym Extraction from The Web. International Journal of Intelligent Information Technologies (IJIIT), 15(3), 42-63. http://doi.org/10.4018/IJIIT.2019070103

Chicago

Ma, Xiuxia, et al. "Multi-Distribution Characteristics Based Chinese Entity Synonym Extraction from The Web," International Journal of Intelligent Information Technologies (IJIIT) 15, no.3: 42-63. http://doi.org/10.4018/IJIIT.2019070103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Entity synonyms play an important role in natural language processing applications, such as query expansion and question answering. There are three main distribution characteristics in web texts:1) appearing in parallel structures; 2) occurring with specific patterns in sentences; and 3) distributed in similar contexts. The first and second characteristics rely on reliable prior knowledge and are susceptive to data sparseness, bringing high accuracy and low recall to synonym extraction. The third one may lead to high recall but low accuracy, since it identifies a somewhat loose semantic similarity. Existing methods, such as context-based and pattern-based methods, only consider one characteristic for synonym extraction and rarely take their complementarity into account. For increasing recall, this article proposes a novel extraction framework that can combine the three characteristics for extracting synonyms from the web, where an Entity Synonym Network (ESN) is built to incorporate synonymous knowledge. To improve accuracy, the article treats synonym detection as a ranking problem and uses the Spreading Activation model as a ranking means to detect the hard noise in ESN. Experimental results show the proposed method achieves better accuracy and recall than the state-of-the-art methods.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.