An adaptive personalized news dissemination system

Katakis, Ioannis; Tsoumakas, Grigorios; Banos, Evangelos; Bassiliades, Nick; Vlahavas, Ioannis

doi:10.1007/s10844-008-0053-8

An adaptive personalized news dissemination system

Published: 28 February 2008

Volume 32, pages 191–212, (2009)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Ioannis Katakis¹,
Grigorios Tsoumakas¹,
Evangelos Banos¹,
Nick Bassiliades¹ &
…
Ioannis Vlahavas¹

764 Accesses
53 Citations
3 Altmetric
Explore all metrics

Abstract

With the explosive growth of the Word Wide Web, information overload became a crucial concern. In a data-rich information-poor environment like the Web, the discrimination of useful or desirable information out of tons of mostly worthless data became a tedious task. The role of Machine Learning in tackling this problem is thoroughly discussed in the literature, but few systems are available for public use. In this work, we bridge theory to practice, by implementing a web-based news reader enhanced with a specifically designed machine learning framework for dynamic content personalization. This way, we get the chance to examine applicability and implementation issues and discuss the effectiveness of machine learning methods for the classification of real-world text streams. The main features of our system named PersoNews are: (a) the aggregation of many different news sources that offer an RSS version of their content, (b) incremental filtering, offering dynamic personalization of the content not only per user but also per each feed a user is subscribed to, and (c) the ability for every user to watch a more abstracted topic of interest by filtering through a taxonomy of topics. PersoNews is freely available for public use on the WWW (http://news.csd.auth.gr).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving News Personalization Through Search Logs

Content-Based News Recommendation: Comparison of Time-Based System and Keyphrase-Based System

News Recommendation in Real-Time

Notes

The Apache SpamAssassin Project: http://spamassassin.apache.org/
SpamBayes: Bayesian Anti-Spam Classifier: http://spambayes.sourceforge.net/
Mozilla Thunderbird: http://wwwmozill.com/thunderbird/
Google Reader—http://reader.google.com
Bloglines—http://www.bloglines.com
SharpReader—http://sharpreader.net
Digg—http://digg.com
NewsCloud—http://newscloud.com
Findory - http://www.findory.com
Spotback – http://www.spotback.com
Reddit – http://www.reddit.com
Google News – http://news.google.com
PNS - http://pns.iit.demokritos.gr/
MyFeedz – http://www.myfeedz.com
http://spambayes.sourceforge.net/, http://popfile.sourceforge.net/
http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html
Both datasets are available at http://mlkd.csd.auth.gr/datasets.html
Note that all IFS enhanced methods can be applied with no initial training set. Unfortunately the three baseline methods described in the section need a set of training documents in order to construct the feature space that they use.
The respective figures for the spam corpus are similar.
http://www.acm.org/class/
http://www.w3.org/MarkUp/
http://www.w3.org/Style/CSS/
http://www.mozilla.org/js/
http://www.dhtmlcentral.com/
http://www.debian.org
http://www.apache.org
http://www.mysql.com
http://www.php.net
http://pear.php.net
http://magpierss.sourceforge.net/
http://smarty.php.net
http://news.csd.auth.gr
As positive, we consider the characterization of a message as uninteresting.

References

Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., Paliouras, G., & Spyropoulos, C. D. (2000). An evaluation of naive bayesian anti-spam filtering. In Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000), Barcelona, Spain.
Banos, E., Katakis, I., Bassiliades, N., Tsoumakas, G., & Vlahavas, I. (2006). PersoNews: A personalized news reader enhanced by machine learning and semantic filtering. In Proceedings of the 5th International Conference on Ontologies, DataBases and Applications of Semantics (ODBASE 2006). Montpellier, France: Springer.
Bharat, K., Kamba, T., & Albers, M. (1998). Personalized, interactive news on the web. Multimedia Systems, 6(5), 349–358.
Article Google Scholar
Billsus, D., & Pazzani, M. (1999). A hybrid user model for news story classification. In Proceedings of the Seventh International Conference on User Modeling. Banff, Canada: Springer.
Carreira, R., Crato, J. M., Goncalves, D., & Jorge, J. A. (2004). Evaluating adaptive user profiles for news classification. In Proceedings of the 9th International Conference on Intelligent user Interface. Funchal. Madeira, Portugal: ACM.
Chan, C.-H., Sun, A., & Lim, E.-P. (2001). Automated online news classification with personalization. In Proceedings of the 4th International Conference of Asian Digital Library (ICADL2001), Bangalore, India.
Chin, J. P., Diehl, V. A., & Norman, K. L. (1988). Development of an instrument measuring user satisfaction of the human-computer interface. In Proceedings of SIGCHI Conference on Human factors in computing systems. Washington, DC: ACM.
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the seventh international conference on Information and knowledge management. Bethesda, MD: ACM.
Fan, W. (2004). Systematic data selection to mine concept-drifting data streams. In Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle, WA: ACM.
Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, CA: ACM.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning. New York: Springer.
Katakis, I., Tsoumakas, G., & Vlahavas, I. (2006). Dynamic feature space and incremental feature selection for the classification of textual data streams. In Proceedings of ECML/PKDD-2006 International Workshop on knowledge discovery from data streams. Berlin, Germany: Springer.
Kim, B. M., Li, Q., Park, C. S., Kim, S. G., & Kim, J. Y. (2006). A new approach for combining content-based and collaborative filters. Journal of Intelligent Information Systems, 27(1), 79–91.
Article Google Scholar
Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 8(3), 281–200.
Google Scholar
Kokkoras, F., Bassiliades, N., & Vlahavas, I. (2007). Cooperative CG-wrappers for web content extraction. In Proceedings of the 15th International Conference on Conceptual Structures, ICCS’07, Sheffield, UK.
Laskov, P., Gehl, C., Kruger, S., & Muller, K.-R. (2006). Incremental support vector learning: Analysis, implementation and applications. Journal of Machine Learning Research, 7, 1909–1936.
MathSciNet Google Scholar
Lewis, D. D. (1992). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. Copenhagen, Denmark: ACM.
Lewis, D. D., & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV.
McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Google Scholar
Scholz, M., & Klinkenberg, R. (2007). Boosting classifiers for drifting concepts. Intelligent Data Analysis, 11(1), 3–28.
Google Scholar
Schutze, H., Hull, D. A., & Pedersen, J. O. (1995). A comparison of classifiers and document representations for the routing problem. In Proceedings of the SIGIR ‘95, 18th Annual International ACM SIGIR conference on research and development in information retrieval. Seattle, WA: ACM.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.
Article Google Scholar
Tsymbal, A. (2004). The problem of concept drift: Definitions and related work. Technical Report. Dublin, Ireland: Department of Computer Science, Trinity College.
Google Scholar
Wenerstrom, B., & Giraud-Carrier, C. (2006). Temporal data mining in dynamic feature spaces. In Proceedings of the Sixth International Conference on Data Mining.
Widmer, G., & Kubat, M. (1996). Learning in the presense of concept drift and hidden contexts. Machine Learning, 23(1), 69–101.
Google Scholar
Witten, I., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco, CA: Kaufmann.
MATH Google Scholar
Yang, Y. (1994a). An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3), 252–277.
Article Google Scholar
Yang, Y. (1994b). Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proceedings of the 17th Annual International ACM SIGIR conference on research and development in information retrieval. Dublin, Ireland: Springer.
Yang, Y., & Pedersn, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of ICML-97, 14th International Conference on Machine Learning. San Francisco, CA: Kaufmann.

Download references

Acknowledgements

This work was partially supported by a PENED program (EPAN M.8.3.1, No. 03EΔ73), jointly funded by the European Union and the Greek Government (General Secretariat of Research and Technology/GSRT).

Author information

Authors and Affiliations

Department of Informatics, Aristotle University, 54124, Thessaloniki, Greece
Ioannis Katakis, Grigorios Tsoumakas, Evangelos Banos, Nick Bassiliades & Ioannis Vlahavas

Authors

Ioannis Katakis
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Tsoumakas
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Banos
View author publications
You can also search for this author in PubMed Google Scholar
Nick Bassiliades
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Vlahavas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ioannis Katakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Katakis, I., Tsoumakas, G., Banos, E. et al. An adaptive personalized news dissemination system. J Intell Inf Syst 32, 191–212 (2009). https://doi.org/10.1007/s10844-008-0053-8

Download citation

Received: 15 May 2007
Revised: 11 January 2008
Accepted: 15 January 2008
Published: 28 February 2008
Issue Date: April 2009
DOI: https://doi.org/10.1007/s10844-008-0053-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive personalized news dissemination system

Abstract

Access this article

Similar content being viewed by others

Improving News Personalization Through Search Logs

Content-Based News Recommendation: Comparison of Time-Based System and Keyphrase-Based System

News Recommendation in Real-Time

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An adaptive personalized news dissemination system

Abstract

Access this article

Similar content being viewed by others

Improving News Personalization Through Search Logs

Content-Based News Recommendation: Comparison of Time-Based System and Keyphrase-Based System

News Recommendation in Real-Time

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation