Abstract
The task of automated searching for interesting text documents frequently suffers from a very poor balance among documents representing both positive and negative examples or from one completely missing class. This paper suggests the ranking approach based on the k-NN algorithm adapted for determining the similarity degree of new documents just to the representative positive collection. From the viewpoint of the precision-recall relation, a user can decide in advance how many and how similar articles should be released through a filter.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hroza, J., Žižka, J., Bourek, A.: Filtering Very Similar Text Documents: A Case Study. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 511–520. Springer, Heidelberg (2004)
Manevitz, L.R., Yousef, M.: One-Class SVMs for Document Classification. Journal of Machine Learning Research 2, 139–154 (2001)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Porter, M.F.: An Algorithm For Suffix Stripping. Program 14(3), 130–137 (1980)
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Department of Computer Science, University of Glasgow (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hroza, J., Žižka, J. (2005). Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_65
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)