Skip to main content

Term Extraction Method Based on Mutual Information with Threshold Interval

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 227))

Abstract

The problems of Mutual Information were analyzed when it was used for term extraction. In order to reduce the impact of problems, a method of candidate term filtration and extraction with threshold interval was proposed. And a determination algorithm was given, which can give the best upper and lower thresholds fast and accurately through data sampling, statistics and computing. Compared with the method of mutual information filtration with single threshold, the proposed method filtered and extracted candidate terms by setting two thresholds in the premise of not changing the calculating formula of mutual information. Experimental results show that the proposed method can improve the precise rate and F-measure significantly under the same conditions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Church, K., Gale, W., Hanks, P., et al.: Using statistics in lexical analysis. In: Zernik, U. (ed.) Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, pp. 115–164. Lawrence Erlbaum, Hillsdale (1991)

    Google Scholar 

  2. Yu, L.: A research on acquisition and verification of concepts from large-scale Chinese corpora. Institute of Computing Technology Chinese Academy of Sciences, Beijing (2006)

    Google Scholar 

  3. Manning, C.D., Schutze, H.: Foundations of Statistical Nnatural Language Processing. Publishing House of Electronics Industry, Beijing (2005)

    Google Scholar 

  4. Pazienza, M.T., Pennacchiotti, M., Zanzotto, F.M.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Knowledge Mining, pp. 255–279. Springer, Berlin (2005)

    Chapter  Google Scholar 

  5. Hodges, J., Yie, S., et al.: An automated system that assists in the generation of document indexes. Natural Language Engineering 1996(2), 137–160 (1996)

    Article  Google Scholar 

  6. Zhang, F., Xu, Y., Hou, Y., et al.: Chinese term extraction system based on mutual information. Application Research of Computers 2005(5), 72–73 (2005)

    Google Scholar 

  7. He, T.T., Zhang, Y.: Automatic Chinese term extraction based on decomposition of Prime String. Computer Engineering 32(23), 188–190 (2006)

    Google Scholar 

  8. Hu, W.M., He, T.T., Zhang, Y.: Extraction of Chinese term based on chi-square test. Computer Application 27(12), 3019–3020 (2007)

    Google Scholar 

  9. Liang, Y.H., Zhang, W.J., Zhou, D.F.: A hybrid strategy for high precision long term extraction. Journal of Chinese Information Processing 23(6), 26–30 (2009)

    Google Scholar 

  10. Sun, J.P., Jia, M., Liu, Z.B.: On a text-oriented concept extraction technique. Computer Applications and Software 26(9), 28–30 (2009)

    Google Scholar 

  11. Chen, W.L., Zhu, J.B., Yao, T.S., et al.: Automatic learning field words by bootstrapping. In: The 7th Joint Symposium of Computational Linguistics, pp. 67–72. Tsinghua University Press, Beijing (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bin, Y., Shichao, C. (2011). Term Extraction Method Based on Mutual Information with Threshold Interval. In: Zhang, J. (eds) Applied Informatics and Communication. ICAIC 2011. Communications in Computer and Information Science, vol 227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23226-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23226-8_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23225-1

  • Online ISBN: 978-3-642-23226-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics