Abstract
In this paper we first propose two new metrics to rank the relevance of words in a text. The metrics presented are purely statistic and language independent and are based in the analysis of each word’s neighborhood. Typically, a relevant word is more strongly connected to some of its neighbors in despite of others. We also present a new technique based on the syllable analysis and show that despite it can be a metric by itself, it can also improve the quality of the proposed methods as also greatly improve the quality of other proposed methods (such as Tf-idf). Finally, based on the rankings previously obtained and using another neighborhood analysis, we present a new method to decide about the relevance of words on a yes/no basis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2, 159–165 (1958)
Zhou, H., Slater, G.W.: A metric to search for relevant words. Physica A: Statistical Mechanics and its Applications 329(1-2), 309–327
Silva, J.F., Mexia, J.T., Coelho, C.A., Lopes, G.P.: Multilingual document clustering, topic extraction and data transformation. In: Brazdil, P.B., Jorge, A.M. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 74–87. Springer, Heidelberg (2001)
Ortuño, M., Carpena, P., Bernaola-Galván, P., Muñoz, E., Somoza, A.M.: Europhys. Lett. 57(5), 759–764 (2002)
Salton, G., Buckley, C.: Term-weighing approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ventura, J., Ferreira da Silva, J. (2007). New Techniques for Relevant Word Ranking and Extraction. In: Neves, J., Santos, M.F., Machado, J.M. (eds) Progress in Artificial Intelligence. EPIA 2007. Lecture Notes in Computer Science(), vol 4874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77002-2_58
Download citation
DOI: https://doi.org/10.1007/978-3-540-77002-2_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77000-8
Online ISBN: 978-3-540-77002-2
eBook Packages: Computer ScienceComputer Science (R0)