Reinforcement Learning with Immediate Rewards and Linear Hypotheses

Abe, Naoki; Biermann, Alan W.; Long, Philip M.

doi:10.1007/s00453-003-1038-1

Reinforcement Learning with Immediate Rewards and Linear Hypotheses

Published: 30 September 2003

Volume 37, pages 263–293, (2003)
Cite this article

Algorithmica Aims and scope Submit manuscript

Naoki Abe¹,
Alan W. Biermann² &
Philip M. Long³

481 Accesses
Explore all metrics

Abstract

We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA
Naoki Abe
Department of Computer Science, Duke University, P.O. Box 90129, Durham, NC 27708, USA
Alan W. Biermann
Genome Institute of Singapore, 1 Science Park Road, #05-01, Singapore 117528, Republic of Singapore
Philip M. Long

Authors

Naoki Abe
View author publications
You can also search for this author in PubMed Google Scholar
Alan W. Biermann
View author publications
You can also search for this author in PubMed Google Scholar
Philip M. Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Naoki Abe, Alan W. Biermann or Philip M. Long.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abe, N., Biermann, A. & Long, P. Reinforcement Learning with Immediate Rewards and Linear Hypotheses. Algorithmica 37, 263–293 (2003). https://doi.org/10.1007/s00453-003-1038-1

Download citation

Received: 13 December 2001
Revised: 26 August 2002
Published: 30 September 2003
Issue Date: December 2003
DOI: https://doi.org/10.1007/s00453-003-1038-1

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning with Immediate Rewards and Linear Hypotheses

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others