tutorial

Learning by Exploration: New Challenges in Real-World Environments

Authors:

Hongning WangAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 3575 - 3576

https://doi.org/10.1145/3394486.3406484

Published: 20 August 2020 Publication History

Abstract

Learning is a predominant theme for any intelligent system, humans, or machines. Moving beyond the classical paradigm of learning from past experience, e.g., offline supervised learning from given labels, a learner needs to actively collect exploratory feedback to learn from the unknowns, i.e., learning through exploration. This tutorial will introduce the learning by exploration paradigm, which is the key ingredient in many interactive online learning problems, including the multi-armed bandit and, more generally, reinforcement learning problems.

In this tutorial, we will first motivate the need for exploration in machine learning algorithms and highlight its importance in many real-world problems where online sequential decision making is involved. In real-world application scenarios, considerable challenges arise in such a learning problem, including sample complexity, costly and even outdated feedback, and ethical considerations of exploration (such as fairness and privacy). We will introduce several classical exploration strategies and then highlight the aforementioned three fundamental challenges in the learning from exploration paradigm and introduce the recent research development on addressing them, respectively.

References

[1]

Y. Abbasi-yadkori, D. Pál, and C. Szepesvári. Improved algorithms for linear stochastic bandits. In NIPS, pages 2312--2320. 2011.

Digital Library

[2]

S. Agrawal and N. Goyal. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning, pages 127--135, 2013.

Digital Library

[3]

P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397--422, 2002.

[4]

P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2--3):235--256, May 2002.

[5]

Y. Cao, Z. Wen, B. Kveton, and Y. Xie. Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. arXiv preprint arXiv:1802.03692, 2018.

[6]

N. Cesa-Bianchi, C. Gentile, and G. Zappella. A gang of bandits. In Pro. NIPS, 2013.

Digital Library

[7]

Y. Chen, A. Cuellar, H. Luo, J. Modi, H. Nemlekar, and S. Nikolaidis. The fair contextual multi-armed bandit. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1810--1812, 2020.

Digital Library

[8]

Y. Chen, C.-W. Lee, H. Luo, and C.-Y. Wei. A new algorithm for non-stationary contextual bandits: Efficient, optimal, and parameter-free. In Conference on Learning Theory, 2019.

[9]

A. Garivier and E. Moulines. On upper-confidence bound policies for non-stationary bandit problems. In arXiv preprint arXiv:0805.3415 (2008).

[10]

C. Gentile, S. Li, P. Kar, A. Karatzoglou, G. Zappella, and E. Etrue. On context-dependent clustering of bandits. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1253--1262. JMLR. org, 2017.

Digital Library

[11]

C. Gentile, S. Li, and G. Zappella. Online clustering of bandits. In Pro. of the 31st International Conference on Machine Learning (ICML-14), pages 757--765, 2014.

[12]

N. Hariri, B. Mobasher, and R. Burke. Adapting to user preference changes in interactive recommendation. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, pages 4268--4274. AAAI Press, 2015.

Digital Library

[13]

M. Joseph, M. Kearns, J. H. Morgenstern, and A. Roth. Fairness in learning: Classic and contextual bandits. In Advances in Neural Information Processing Systems, pages 325--333, 2016.

Digital Library

[14]

E. Kaufmann, N. Korda, and R. Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory, pages 199--213. Springer, 2012.

Digital Library

[15]

B. Kveton, C. Szepesvari, M. Ghavamzadeh, and C. Boutilier. Perturbed-history exploration in stochastic linear bandits. arXiv preprint arXiv:1903.09132, 2019.

[16]

B. Kveton, C. Szepesvari, S. Vaswani, Z. Wen, T. Lattimore, and M. Ghavamzadeh. Garbage in, reward out: Bootstrapping exploration in multi-armed bandits. In International Conference on Machine Learning, pages 3601--3610, 2019.

[17]

J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS'07, page 817--824, Red Hook, NY, USA, 2007. Curran Associates Inc.

Digital Library

[18]

L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of 19th WWW, pages 661--670. ACM, 2010.

Digital Library

[19]

S. Li, A. Karatzoglou, and C. Gentile. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 539--548, 2016.

Digital Library

[20]

H. Luo, C.-Y. Wei, A. Agarwal, and J. Langford. Efficient contextual bandits in non-stationary worlds. In Conference On Learning Theory, pages 1739--1776, 2018.

[21]

Y. Russac, C. Vernade, and O. Cappé. Weighted linear bandits for non-stationary environments. In Advances in Neural Information Processing Systems, pages 12040--12049, 2019.

[22]

R. Shariff and O. Sheffet. Differentially private contextual linear bandits. In Advances in Neural Information Processing Systems, pages 4296--4306, 2018.

[23]

R. S. Sutton, A. G. Barto, et al. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.

Digital Library

[24]

H. Wang, S. Kim, E. McCord-Snook, Q. Wu, and H. Wang. Variance reduction in gradient exploration for online learning to rank. In 42nd ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'19, pages 835--844. ACM, 2019.

Digital Library

[25]

H. Wang, Q. Wu, and H. Wang. Learning hidden features for contextual bandits. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 1633--1642, 2016.

Digital Library

[26]

H. Wang, Q. Wu, and H. Wang. Factorization bandits for interactive recommendation. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.

[27]

Q. Wu, N. Iyer, and H. Wang. Learning contextual bandits in a collaborative environment. In SIGIR 2018.

[28]

Q. Wu, Z. Li, H. Wang, W. Chen, and H. Wang. Factorization bandits for online influence maximization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 636--646, 2019.

Digital Library

[29]

Q. Wu, H. Wang, Q. Gu, and H. Wang. Contextual bandits in a collaborative environment. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 529--538, 2016.

Digital Library

[30]

Q. Wu, H. Wang, Y. Li, and H. Wang. Dynamic ensemble of contextual bandits to satisfy users' changing interests. In The World Wide Web Conference, WWW '19, pages 2080--2090, New York, NY, USA, 2019. ACM.

Digital Library

[31]

C. Zhang, A. Agarwal, H. D. Iii, J. Langford, and S. Negahban. Warm-starting contextual bandits: Robustly combining supervised and bandit feedback. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7335--7344, Long Beach, California, USA, 09-15 Jun 2019. PMLR.

[32]

J. Zhang and E. Bareinboim. Transfer learning in multi-armed bandit: a causal approach. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 1778--1780, 2017.

Cited By

Marasinghe MKasturiratne KChandratilake M(2023)Information seeking behavior of medical students in Sri LankaInformation Development10.1177/02666669231179398Online publication date: 12-Jun-2023
https://doi.org/10.1177/02666669231179398
Wang HJia YWang HDiaz FShah CSuel TCastells PJones RSakai T(2021)Interactive Information Retrieval with Bandit FeedbackProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462810(2658-2661)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3462810

Index Terms

Learning by Exploration: New Challenges in Real-World Environments
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Sequential decision making
    2. Learning settings
      1. Online learning settings

Recommendations

Reinforcement learning for crop management support: Review, prospects and challenges
Abstract
Reinforcement learning (RL), including multi-armed bandits, is a branch of machine learning that deals with the problem of sequential decision-making in uncertain and unknown environments through learning by practice. While best known ...
Highlights
- Reinforcement learning is a promising AI framework to support crop management.
- ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Reinforcement Learning with Derivative-Free Exploration
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Effective exploration is key to sample-efficient reinforcement learning. While the most popular general approaches (e.g., ε-greedy) for exploration are still of low efficiency, derivative-free optimization also invents efficient ways of exploration for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Check for updates

Author Tags

Qualifiers

Tutorial

Funding Sources

National Science Foundation

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
282
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marasinghe MKasturiratne KChandratilake M(2023)Information seeking behavior of medical students in Sri LankaInformation Development10.1177/02666669231179398Online publication date: 12-Jun-2023
https://doi.org/10.1177/02666669231179398
Wang HJia YWang HDiaz FShah CSuel TCastells PJones RSakai T(2021)Interactive Information Retrieval with Bandit FeedbackProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462810(2658-2661)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3462810

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten