research-article

Scalable Preference Learning from Data Streams

Authors:

Thomas Lansdall-Welfare,

Saatviga Sudhahar,

Nello CristianiniAuthors Info & Claims

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Pages 885 - 890

https://doi.org/10.1145/2740908.2742008

Published: 18 May 2015 Publication History

Abstract

We study the task of learning the preferences of online readers of news, based on their past choices. Previous work has shown that it is possible to model this situation as a competition between articles, where the most appealing articles of the day are those selected by the most users. The appeal of an article can be computed from its textual content, and the evaluation function can be learned from training data. In this paper, we show how this task can benefit from an efficient algorithm, based on hashing representations, which enables it to be deployed on high intensity data streams. We demonstrate the effectiveness of this approach on four real world news streams, compare it with standard approaches, and describe a new online demonstration based on this technology.

References

[1]

Christopher M Bishop. Pattern Recognition and Machine Learning, volume 1. Springer New York, 2006.

Digital Library

[2]

Léon Bottou. Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade, pages 421--436. Springer, 2012.

[3]

Christopher JC Burges. Dimension Reduction. Now Publishers Inc, 2010.

[4]

Graham Cormode and S Muthukrishnan. An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Journal of Algorithms, 55(1):58--75, 2005.

Digital Library

[5]

N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.

Digital Library

[6]

Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. A Sparse Johnson-Lindenstrauss Transform. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, pages 341--350. ACM, 2010.

Digital Library

[7]

I. Flaounas, O. Ali, M. Turchi, T. Snowsill, F. Nicart, T. De Bie, and N. Cristianini. NOAM: News Outlets Analysis and Monitoring System. In SIGMOD 2011, pages 1275--1278. ACM, 2011.

Digital Library

[8]

Ilias Flaounas, Thomas Lansdall-Welfare, Panagiota Antonakaki, and Nello Cristianini. The Anatomy of a Modular System for Media Content Analysis. CoRR, abs/1402.6208, 2014.

[9]

Elena Hensinger, Ilias Flaounas, and Nello Cristianini. Modelling and Predicting News Popularity. Pattern Analysis and Applications, 16(4):623--635, 2013.

Digital Library

[10]

Thorsten Joachims. Learning to Classify Text using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, 2002.

Digital Library

[11]

Thorsten Joachims. Optimizing Search Engines using Clickthrough Data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 133--142. ACM, 2002.

Digital Library

[12]

Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, volume 1. Cambridge University Press Cambridge, 2008.

Digital Library

[13]

Ricardo Nanculef, Ilias Flaounas, and Nello Cristianini. Efficient Classification of Multi-labelled Text Streams by Clashing. Expert Systems with Applications, 2014.

[14]

Stephen Robertson. Understanding Inverse Document Frequency: on Theoretical Arguments for IDF. Journal of documentation, 60(5):503--520, 2004.

[15]

Evan Sandhaus. The New York Times Annotated Corpus. Linguistic Data Consortium, Philadelphia, 6(12), 2008.

[16]

Kilian Weinberger, Anirban Dasgupta, John Langford, Alex Smola, and Josh Attenberg. Feature Hashing for Large Scale Multitask Learning. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009.

Digital Library

Cited By

Cao YFeng YWang HXie XZhou S(2024)Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338858946:11(7136-7153)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3388589
Liu QWu JHuang ZWang HNing YChen MChen EYi JZhou B(2023)Federated User Modeling from Hierarchical InformationACM Transactions on Information Systems10.1145/356048541:2(1-33)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3560485
Yang MTjuawinata ILam KZhu TZhao J(2023)Differentially Private Distributed Frequency EstimationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.322765420:5(3910-3926)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TDSC.2022.3227654
Show More Cited By

Index Terms

Scalable Preference Learning from Data Streams
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Unbiased Learning to Rank: Online or Offline?

How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups—the studies on unbiased learning ...
Active learning for data streams: a survey
Abstract
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of ...
Online Learning to Rank: Absolute vs. Relative
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

Online learning to rank holds great promise for learning personalized search result rankings. First algorithms have been proposed, namely absolute feedback approaches, based on contextual bandits learning; and relative feedback approaches, based on ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

May 2015

1602 pages

ISBN:9781450334730

DOI:10.1145/2740908

General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy

Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ThinkBig
Complacs

Conference

WWW '15

Sponsor:

IW3C2

WWW '15: 24th International World Wide Web Conference

May 18 - 22, 2015

Florence, Italy

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
112
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cao YFeng YWang HXie XZhou S(2024)Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming DataIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.338858946:11(7136-7153)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3388589
Liu QWu JHuang ZWang HNing YChen MChen EYi JZhou B(2023)Federated User Modeling from Hierarchical InformationACM Transactions on Information Systems10.1145/356048541:2(1-33)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3560485
Yang MTjuawinata ILam KZhu TZhao J(2023)Differentially Private Distributed Frequency EstimationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.322765420:5(3910-3926)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TDSC.2022.3227654
Miao RDong FZhao YZhao YWu YYang KYang TCui B(2023)SketchConf: A Framework for Automatic Sketch Configuration2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00157(2022-2035)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00157
Wu JLiu QHuang ZNing YWang HChen EYi JZhou B(2021)Hierarchical Personalized Federated Learning for User ModelingProceedings of the Web Conference 202110.1145/3442381.3449926(957-968)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3449926

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten