Neural Based Approach to Keyword Extraction from Documents

Jo, Taeho

doi:10.1007/3-540-44839-X_49

Taeho Jo¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2667))

Included in the following conference series:

International Conference on Computational Science and Its Applications

793 Accesses
4 Citations

Abstract

Documents are unstructured data consisting of natural language. Document surrogate means the structured data converted from original documents to process them in computer systems. Document surrogate is usually represented into a list of words. Because not all words in a document reflect its content, it is necessary to select important words related with its content among them. Such important words are called keywords and they are selected with a particular equation based on TF (Term Frequency) and IDF (inverted Document Frequency). Actually, not only TF and IDF but also the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicate to be applied to the selection of keywords. This paper proposes the neural network model, back propagation, in which these factors are used as the features and feature vectors are generated, and with which keywords are selected. This paper will show that back-propagation outperforms the equation in distinguishing keywords.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Korfhage, R. R., Information Storage and Retrieval, John Wiley & Sons Inc (1997)
Google Scholar
Salton, G. and Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing & Management. 24 (1988) 513–523
Article Google Scholar
Pereira F., Tishby, N., and Lee, L.: Distributional Clustering of English Words. The Proceedings of 30th Annual Meeting of the Association for Computational Linguistics, (1993) 183–190
Google Scholar
Yang, Y: Noise Reduction in a Statistical Approaches to Text Categorization. The Proceedings of SIGIR 95, (1995) 256–263
Google Scholar
Wiener, E. D.: A Neural Network Approach to Topic Spotting in Text. Thesis of the Graduate School of the University of Colorado, (1995)
Google Scholar
Maron, M. E.: Probabilistic Indexing and Information Retrieval. In: Sparck, K. and Willett, P. (eds.): Readings in Information Retrieval. Readings in Information Retrieval (1997) 39–46
Google Scholar
Tseng, Y.: Multilingual Keyword Extraction for Term Suggestion. The Proceedings of SIGIR 98, (1998) 377–378
Google Scholar
Hofmann, T.: Probabilistic latent indexing. The Proceedings of SIGIR 99, (1999) 50–57.
Google Scholar
Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34. (1999) 233–272
Article MATH Google Scholar
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39 (2000) 169–202
Article MATH Google Scholar
Freeman, J. A. and Skapura, D.M.: Neural Networks: Algorithms, Applications, and Programming Techniques. Addison-Wesley Publishing Company (1992)
Google Scholar
Korfhage, R.R.: Information Storage and Retrieval. John Wiley & Sons Inc (1997)
Google Scholar
Jo, T.: The Application of Text Mining to Knowledge Management System, Kwave. white paper in Samsung SDS, (1998).
Google Scholar

Download references

Author information

Authors and Affiliations

SITE, University of Ottawa, 800 King Edward Ave, Ottawa, Ontario, Canada, K1N 6N5
Taeho Jo

Authors

Taeho Jo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Army High Performance Computing Research Center, USA
Vipin Kumar
Department of Computer Science, University of Calgary, Calgary, AB, T2N1N4, Canada
Marina L. Gavrilova
Heuchera Technologies Inc., 122 9251-8 Yonge Street, Richmond Hill, ON, Canada, L4C 9T3
Chih Jeng Kenneth Tan
Département d’informatique et de recherche opérationelle, Université de Montréal, Montréal, Québec, H3C 3J7, Canada
Pierre L’Ecuyer
Department of Computer Science and Engineering, University of Minessota, MN, 55455, USA
Vipin Kumar
The Queen’s University of Belfast, School of Computer Science, Belfast BT7 1NN, Northern Ireland, UK
Chih Jeng Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jo, T. (2003). Neural Based Approach to Keyword Extraction from Documents. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_49

Download citation

DOI: https://doi.org/10.1007/3-540-44839-X_49
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40155-1
Online ISBN: 978-3-540-44839-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics