Conferences >2017 International Joint Conf...

Risk-averse trees for learning from logged bandit feedback

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Logged data is one of the most widespread form of recorded information, since it can be acquired by almost any system and stored at a little cost. Customarily, the intera...Show More

Metadata

Abstract:

Logged data is one of the most widespread form of recorded information, since it can be acquired by almost any system and stored at a little cost. Customarily, the interaction logs between the system and a user (or environment) present the structure of a sequential decision process: given a context, the system performs an action and the user provides a feedback about it. This structure is common to a wide range of real-world micro-economic applications, e.g., e-commerce websites and advertisement campaigns. The problem of learning a policy from such logged interactions to take more profitable decisions in the future is known as the Learning from Logged Bandit Feedback (LLBF) problem. In this paper, we propose RADT, an algorithm specifically shaped for the LLBF setting and based on a risk-averse learning method which exploits the joint use of regression trees and statistical confidence bounds. Differently from existing techniques developed for this setting, RADT generates policies aiming to maximize a lower bound on the expected reward and provides a clear characterization of those features in the context that influence the process the most. Finally, we provide a wide experimental campaign over both synthetic and real-world datasets showing empirical evidence that RADT outperforms both state-of-the-art machine learning classification and regression techniques and existing methods addressing the LLBF setting.

Published in: 2017 International Joint Conference on Neural Networks (IJCNN)

Date of Conference: 14-19 May 2017

Date Added to IEEE Xplore: 03 July 2017

ISBN Information:

Electronic ISSN: 2161-4407

DOI: 10.1109/IJCNN.2017.7965958

Conference Location: Anchorage, AK, USA

Contents

References is not available for this document.

Risk-averse trees for learning from logged bandit feedback

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Risk-averse trees for learning from logged bandit feedback

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?