Skip to main content

A Self-learning Rule-Based Approach for Sci-tech Compound Phrase Entity Recognition

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9313))

Included in the following conference series:

  • 2801 Accesses

Abstract

Sci-tech compound phrase entity (e.g., the names of projects, books and patents) recognition is a fundamental task of science and technology data processing and discovery. However, much less work have been done on the problem. In this paper, we first give the characteristics of sci-tech entities that are different from personal name and other traditional entities. Then we introduce a self-learning rule-based approach to address the problem. The approach consists of three stages, namely rule-based text truncation, BlackPOS-based text split and WhiteKey-based confirmation. Constructing the best WhiteKey list is a NP-hard problem. We further design dynamic programming and greedy algorithms to address the problem. Experimental results show that our approach achieves 94.78% precision rate, 89.19% recall rate and 91.9% F 1 measure in average. Moreover, our approach is universal and orthogonal to prior named entity recognition work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A High-performance Learning Name-finder. In: Proc. of ANLC, pp. 194–201 (1997)

    Google Scholar 

  2. Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE Named Entity System as Used in MUC-7. In: Proc. of MUC (1998)

    Google Scholar 

  3. Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain Adaptation of Rule-based Annotators for Named-Entity Recognition Tasks. In: Proc. of EMNLP, pp. 1002–1012 (2010)

    Google Scholar 

  4. Cucerzan, S.: Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In: Proc. of EMNLP-CoNLL 2007, pp. 708–716 (2007)

    Google Scholar 

  5. Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proc. of EMNLP and VLC, pp. 90–99 (1999)

    Google Scholar 

  6. Farmakiotou, D., Karkaletsis, V., Koutsias, J., Sigletos, G., Spyropoulos, C.D., Stamatopoulos, P.: Rule-Based Named Entity Recognition For Greek Financial Texts. In: Proc. of COMLEX, pp. 75–78 (2000)

    Google Scholar 

  7. Zhang, H.: NLPIR/ICTCLAS (2012), http://ictclas.nlpir.org/

  8. Krishnan, V., Manning, C.D.: An Effective Two-stage Model for Exploiting Non-local Dependencies in Named Entity Recognition. In: Proc. of ACL, pp. 1121–1128 (2006)

    Google Scholar 

  9. Mann, G.S., Yarowsky, D.: Unsupervised Personal Name Disambiguation. In: Proc. of CONLL at HLT-NAACL 2003, vol. 4, pp. 33–40 (2003)

    Google Scholar 

  10. McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In: Proc. of CONLL at HLT-NAACL 2003, vol. 4, pp. 188–191 (2003)

    Google Scholar 

  11. Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without Gazetteers. In: Proc. of EACL, pp. 1–8 (1999)

    Google Scholar 

  12. Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification (2007), http://brown.cl.uni-heidelberg.de/~sourjiko/NER_Literatur/survey.pdf

  13. Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of MUC (1998)

    Google Scholar 

  14. Sogou Labs: Sogou Text Classification Corpus (2008), http://www.sogou.com/labs/dl/c.html/

  15. Takeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: Proc. of COLING, vol. 20, pp. 1–7 (2002)

    Google Scholar 

  16. Viggo Kann: Minimum Set Cover (2000), http://perso.ensta-paristech.fr/~diam/ro/online/viggo_wwwcompendium/node146.html#6062

  17. Wikipedia: Intellectual Property Protection, en.wikipedia.org/wiki/Intellectual_property

  18. Wikipedia: Open-Source Intelligence, en.wikipedia.org/wiki/Open-source_intelligence

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingwen Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Liu, T., Zhang, Y., Yan, Y., Shi, J., Guo, L. (2015). A Self-learning Rule-Based Approach for Sci-tech Compound Phrase Entity Recognition. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25255-1_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25254-4

  • Online ISBN: 978-3-319-25255-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics