Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples

Hroza, Jiří; Žižka, Jan

doi:10.1007/978-3-540-30586-6_65

Jiří Hroza¹⁷ &
Jan Žižka¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2320 Accesses
2 Citations

Abstract

The task of automated searching for interesting text documents frequently suffers from a very poor balance among documents representing both positive and negative examples or from one completely missing class. This paper suggests the ranking approach based on the k-NN algorithm adapted for determining the similarity degree of new documents just to the representative positive collection. From the viewpoint of the precision-recall relation, a user can decide in advance how many and how similar articles should be released through a filter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Efficient Ranking Framework for Information Retrieval Using Similarity Measure

Weighted Similarity: A New Similarity Measure for Document Ranking Features

A Hybrid Methodology of Effective Text-Similarity Evaluation

References

Hroza, J., Žižka, J., Bourek, A.: Filtering Very Similar Text Documents: A Case Study. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 511–520. Springer, Heidelberg (2004)
Chapter Google Scholar
Manevitz, L.R., Yousef, M.: One-Class SVMs for Document Classification. Journal of Machine Learning Research 2, 139–154 (2001)
Article Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Porter, M.F.: An Algorithm For Suffix Stripping. Program 14(3), 130–137 (1980)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Department of Computer Science, University of Glasgow (1979)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Department of Information Technologies, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Jiří Hroza & Jan Žižka

Authors

Jiří Hroza
View author publications
You can also search for this author in PubMed Google Scholar
Jan Žižka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hroza, J., Žižka, J. (2005). Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_65

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics