Skip to main content

Neural Based Approach to Keyword Extraction from Documents

  • Conference paper
  • First Online:
Book cover Computational Science and Its Applications — ICCSA 2003 (ICCSA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2667))

Included in the following conference series:

Abstract

Documents are unstructured data consisting of natural language. Document surrogate means the structured data converted from original documents to process them in computer systems. Document surrogate is usually represented into a list of words. Because not all words in a document reflect its content, it is necessary to select important words related with its content among them. Such important words are called keywords and they are selected with a particular equation based on TF (Term Frequency) and IDF (inverted Document Frequency). Actually, not only TF and IDF but also the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicate to be applied to the selection of keywords. This paper proposes the neural network model, back propagation, in which these factors are used as the features and feature vectors are generated, and with which keywords are selected. This paper will show that back-propagation outperforms the equation in distinguishing keywords.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Korfhage, R. R., Information Storage and Retrieval, John Wiley & Sons Inc (1997)

    Google Scholar 

  2. Salton, G. and Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing & Management. 24 (1988) 513–523

    Article  Google Scholar 

  3. Pereira F., Tishby, N., and Lee, L.: Distributional Clustering of English Words. The Proceedings of 30th Annual Meeting of the Association for Computational Linguistics, (1993) 183–190

    Google Scholar 

  4. Yang, Y: Noise Reduction in a Statistical Approaches to Text Categorization. The Proceedings of SIGIR 95, (1995) 256–263

    Google Scholar 

  5. Wiener, E. D.: A Neural Network Approach to Topic Spotting in Text. Thesis of the Graduate School of the University of Colorado, (1995)

    Google Scholar 

  6. Maron, M. E.: Probabilistic Indexing and Information Retrieval. In: Sparck, K. and Willett, P. (eds.): Readings in Information Retrieval. Readings in Information Retrieval (1997) 39–46

    Google Scholar 

  7. Tseng, Y.: Multilingual Keyword Extraction for Term Suggestion. The Proceedings of SIGIR 98, (1998) 377–378

    Google Scholar 

  8. Hofmann, T.: Probabilistic latent indexing. The Proceedings of SIGIR 99, (1999) 50–57.

    Google Scholar 

  9. Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34. (1999) 233–272

    Article  MATH  Google Scholar 

  10. Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39 (2000) 169–202

    Article  MATH  Google Scholar 

  11. Freeman, J. A. and Skapura, D.M.: Neural Networks: Algorithms, Applications, and Programming Techniques. Addison-Wesley Publishing Company (1992)

    Google Scholar 

  12. Korfhage, R.R.: Information Storage and Retrieval. John Wiley & Sons Inc (1997)

    Google Scholar 

  13. Jo, T.: The Application of Text Mining to Knowledge Management System, Kwave. white paper in Samsung SDS, (1998).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jo, T. (2003). Neural Based Approach to Keyword Extraction from Documents. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_49

Download citation

  • DOI: https://doi.org/10.1007/3-540-44839-X_49

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40155-1

  • Online ISBN: 978-3-540-44839-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics