A Self-learning Rule-Based Approach for Sci-tech Compound Phrase Entity Recognition

Liu, Tingwen; Zhang, Yang; Yan, Yang; Shi, Jinqiao; Guo, Li

doi:10.1007/978-3-319-25255-1_60

Tingwen Liu¹⁸,
Yang Zhang¹⁸,
Yang Yan¹⁸,
Jinqiao Shi¹⁸ &
…
Li Guo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9313))

Included in the following conference series:

Asia-Pacific Web Conference

2801 Accesses

Abstract

Sci-tech compound phrase entity (e.g., the names of projects, books and patents) recognition is a fundamental task of science and technology data processing and discovery. However, much less work have been done on the problem. In this paper, we first give the characteristics of sci-tech entities that are different from personal name and other traditional entities. Then we introduce a self-learning rule-based approach to address the problem. The approach consists of three stages, namely rule-based text truncation, BlackPOS-based text split and WhiteKey-based confirmation. Constructing the best WhiteKey list is a NP-hard problem. We further design dynamic programming and greedy algorithms to address the problem. Experimental results show that our approach achieves 94.78% precision rate, 89.19% recall rate and 91.9% F ₁ measure in average. Moreover, our approach is universal and orthogonal to prior named entity recognition work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A High-performance Learning Name-finder. In: Proc. of ANLC, pp. 194–201 (1997)
Google Scholar
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE Named Entity System as Used in MUC-7. In: Proc. of MUC (1998)
Google Scholar
Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain Adaptation of Rule-based Annotators for Named-Entity Recognition Tasks. In: Proc. of EMNLP, pp. 1002–1012 (2010)
Google Scholar
Cucerzan, S.: Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In: Proc. of EMNLP-CoNLL 2007, pp. 708–716 (2007)
Google Scholar
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proc. of EMNLP and VLC, pp. 90–99 (1999)
Google Scholar
Farmakiotou, D., Karkaletsis, V., Koutsias, J., Sigletos, G., Spyropoulos, C.D., Stamatopoulos, P.: Rule-Based Named Entity Recognition For Greek Financial Texts. In: Proc. of COMLEX, pp. 75–78 (2000)
Google Scholar
Zhang, H.: NLPIR/ICTCLAS (2012), http://ictclas.nlpir.org/
Krishnan, V., Manning, C.D.: An Effective Two-stage Model for Exploiting Non-local Dependencies in Named Entity Recognition. In: Proc. of ACL, pp. 1121–1128 (2006)
Google Scholar
Mann, G.S., Yarowsky, D.: Unsupervised Personal Name Disambiguation. In: Proc. of CONLL at HLT-NAACL 2003, vol. 4, pp. 33–40 (2003)
Google Scholar
McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In: Proc. of CONLL at HLT-NAACL 2003, vol. 4, pp. 188–191 (2003)
Google Scholar
Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without Gazetteers. In: Proc. of EACL, pp. 1–8 (1999)
Google Scholar
Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification (2007), http://brown.cl.uni-heidelberg.de/~sourjiko/NER_Literatur/survey.pdf
Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of MUC (1998)
Google Scholar
Sogou Labs: Sogou Text Classification Corpus (2008), http://www.sogou.com/labs/dl/c.html/
Takeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: Proc. of COLING, vol. 20, pp. 1–7 (2002)
Google Scholar
Viggo Kann: Minimum Set Cover (2000), http://perso.ensta-paristech.fr/~diam/ro/online/viggo_wwwcompendium/node146.html#6062
Wikipedia: Intellectual Property Protection, en.wikipedia.org/wiki/Intellectual_property
Wikipedia: Open-Source Intelligence, en.wikipedia.org/wiki/Open-source_intelligence

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Tingwen Liu, Yang Zhang, Yang Yan, Jinqiao Shi & Li Guo

Authors

Tingwen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Li Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tingwen Liu .

Editor information

Editors and Affiliations

University of Hong Kong, Hong Kong, China
Reynold Cheng
Computer Science, Peking University, Beijing, China
Bin Cui
Advanced Digital Sciences Center (ADSC), Singapore, Singapore
Zhenjie Zhang
University of Technology, Guangzhou, China
Ruichu Cai
Guangxi University, Guangxi, China
Jia Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, T., Zhang, Y., Yan, Y., Shi, J., Guo, L. (2015). A Self-learning Rule-Based Approach for Sci-tech Compound Phrase Entity Recognition. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-25255-1_60
Published: 13 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25254-4
Online ISBN: 978-3-319-25255-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics