Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information

Gustafson, Nathaniel; Pera, Maria Soledad; Ng, Yiu-Kai

doi:10.1007/978-3-540-69848-7_20

Nathaniel Gustafson¹,
Maria Soledad Pera¹ &
Yiu-Kai Ng¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5073))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1614 Accesses

Abstract

Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. In order to better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds in order to locate articles pertaining to their particular interests. Due to the large number of news articles in individual RSS feeds, there is a need for further organizing articles to aid users in locating non-redundant, informative, and related articles of interest quickly. In this paper, we present a novel approach which uses the word-correlation factors in a fuzzy set information retrieval model to (i) filter out redundant news articles from RSS feeds, (ii) shed less-informative articles from the non-redundant ones, and (iii) cluster the remaining informative articles according to the fuzzy equivalence classes generated on the news articles. Our clustering approach requires little overhead or computational costs, and experimental results have shown that it outperforms other existing well-known clustering approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Fuzzy Document Clustering Model Based on Relevant Ranked Terms

Fake News Detection Based on Multi-view Fuzzy Clustering Algorithm

Extracting Various Types of Informative Web Content via Fuzzy Sequential Pattern Mining

References

Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic Clustering of the Web. Computer Networks and ISDN Systems 29, 8–13, 1157–1166 (1997)
Article Google Scholar
Bun, K., Ishizuka, M.: Topic Extraction from News Archive Using TF*IDF Algorithm. In: Intl. Conf. on Web Information Systems Engineering (WISE), pp. 73–82. Springer (2002)
Google Scholar
Cheng, D., Kannan, R., Vempala, S., Wang, G.: A Divide-and-Merge Methodology for Clustering. ACM TODS(31) 4, 1499–1525 (2006)
Article Google Scholar
Khmelev, D., Teahan, W.: A Repetition-Based Measure for Verification of Text Collections and for Text Categorization. In: 26th Intl. ACM SIGIR Conf., pp. 104–110. ACM, New York (2003)
Google Scholar
Klir, G.K., St. Clair, U., Yuan, B.: Fuzzy Set Theory, Foundations and Applications. Prentice Hall, New Jersey (1997)
MATH Google Scholar
Li, X., Yan, J., Deng, Z., Ji, L., Fan, W., Zhang, B., Chen, Z.: A Novel Clustering-Based RSS Aggregator. In: Intl. Conf. on World Wide Web, pp. 1309–1310. ACM, New York (2007)
Chapter Google Scholar
Li, Y., Chung, S.: Document Clustering Based on Frequent Word Sequences. In: ACM Conf. on Information and Knowledge Management (CIKM), pp. 293–294. ACM, New York (2005)
Chapter Google Scholar
Luger, G.: Artificial Intelligence, Structures and Strategies for Complex Problem Solving, 5th edn. Addison Wesley, San Francisco (2005)
Google Scholar
Nallapati, R., Feng, A., Peng, F., Allan, J.: Event Threading within News Topics. In: ACM Conf. on Information and Knowledge Management (CIKM), pp. 446–453. ACM, New York (2004)
Google Scholar
Ordonez, C.: Clustering Binary Data Streams with K-Means. In: ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 10–17. ACM, New York (2003)
Google Scholar
Sahoo, N., Callan, J., Krishnan, R., Duncan, G., Padman, R.: Incremental Hierarchical Clustering of Text Documents. In: ACM CIKM, pp. 357–366. ACM, New York (2006)
Google Scholar
Wang, Y., Kitsuregawa, M.: Evaluating Contents-Link Coupled Web Page Clustering for Web Search Results. In: ACM CIKM, pp. 499–506. ACM, New York (2002)
Google Scholar
Xu, W., Gong, Y.: News Article Clustering by Concept Factorization. In: 27th Intl. ACM SIGIR Conf, pp. 202–209. ACM, New York (2004)
Google Scholar
Xu, W., Liu, X., Gong, Y.: News Article Clustering Based on Non-Negative Matrix Factorization. In: 26th Intl. ACM SIGIR Conf., pp. 267–273. ACM, New York (2003)
Google Scholar
Yang, H., Callan, J.: Near-Duplicate Detection by Instance-Level Constrained Clustering. In: 29th Intl. ACM SIGIR Conf., pp. 421–428. ACM, New York (2006)
Chapter Google Scholar
Yang, Y., Pierce, T., Carbonell, J.: A Study on Retrospective and On-Line Event Detection. In: 21th Intl. ACM SIGIR Conf., pp. 28–36. ACM, New York (1998)
Google Scholar
Zadeh, L.: Similarity Relations and Fuzzy Orderings. Info. Sci(3), 177–200 (1970)
Google Scholar
Zimmermann, H.: Fuzzy Set Theory and Its Applications. Kluwer Academic, Dordrecht (1991)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Brigham Young University, Provo, Utah, U.S.A.
Nathaniel Gustafson, Maria Soledad Pera & Yiu-Kai Ng

Authors

Nathaniel Gustafson
View author publications
You can also search for this author in PubMed Google Scholar
Maria Soledad Pera
View author publications
You can also search for this author in PubMed Google Scholar
Yiu-Kai Ng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Osvaldo Gervasi Beniamino Murgante Antonio Laganà David Taniar Youngsong Mun Marina L. Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gustafson, N., Pera, M.S., Ng, YK. (2008). Generating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-69848-7_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69840-1
Online ISBN: 978-3-540-69848-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics