ABSTRACT
We study the task of learning the preferences of online readers of news, based on their past choices. Previous work has shown that it is possible to model this situation as a competition between articles, where the most appealing articles of the day are those selected by the most users. The appeal of an article can be computed from its textual content, and the evaluation function can be learned from training data. In this paper, we show how this task can benefit from an efficient algorithm, based on hashing representations, which enables it to be deployed on high intensity data streams. We demonstrate the effectiveness of this approach on four real world news streams, compare it with standard approaches, and describe a new online demonstration based on this technology.
- Christopher M Bishop. Pattern Recognition and Machine Learning, volume 1. Springer New York, 2006. Google ScholarDigital Library
- Léon Bottou. Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade, pages 421--436. Springer, 2012.Google Scholar
- Christopher JC Burges. Dimension Reduction. Now Publishers Inc, 2010.Google Scholar
- Graham Cormode and S Muthukrishnan. An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Journal of Algorithms, 55(1):58--75, 2005. Google ScholarDigital Library
- N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000. Google ScholarDigital Library
- Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A Sparse Johnson-Lindenstrauss Transform. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, pages 341--350. ACM, 2010. Google ScholarDigital Library
- I. Flaounas, O. Ali, M. Turchi, T. Snowsill, F. Nicart, T. De Bie, and N. Cristianini. NOAM: News Outlets Analysis and Monitoring System. In SIGMOD 2011, pages 1275--1278. ACM, 2011. Google ScholarDigital Library
- Ilias Flaounas, Thomas Lansdall-Welfare, Panagiota Antonakaki, and Nello Cristianini. The Anatomy of a Modular System for Media Content Analysis. CoRR, abs/1402.6208, 2014.Google Scholar
- Elena Hensinger, Ilias Flaounas, and Nello Cristianini. Modelling and Predicting News Popularity. Pattern Analysis and Applications, 16(4):623--635, 2013. Google ScholarDigital Library
- Thorsten Joachims. Learning to Classify Text using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, 2002. Google ScholarDigital Library
- Thorsten Joachims. Optimizing Search Engines using Clickthrough Data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 133--142. ACM, 2002. Google ScholarDigital Library
- Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, volume 1. Cambridge University Press Cambridge, 2008. Google ScholarDigital Library
- Ricardo Nanculef, Ilias Flaounas, and Nello Cristianini. Efficient Classification of Multi-labelled Text Streams by Clashing. Expert Systems with Applications, 2014.Google Scholar
- Stephen Robertson. Understanding Inverse Document Frequency: on Theoretical Arguments for IDF. Journal of documentation, 60(5):503--520, 2004.Google ScholarCross Ref
- Evan Sandhaus. The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6(12), 2008.Google Scholar
- Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009. Google ScholarDigital Library
Index Terms
- Scalable Preference Learning from Data Streams
Recommendations
Unbiased Learning to Rank: Online or Offline?
How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups—the studies on unbiased learning ...
Active learning for data streams: a survey
AbstractOnline active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of ...
Online Learning to Rank: Absolute vs. Relative
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebOnline learning to rank holds great promise for learning personalized search result rankings. First algorithms have been proposed, namely absolute feedback approaches, based on contextual bandits learning; and relative feedback approaches, based on ...
Comments