Skip to main content
Log in

Reinforcement Learning with Immediate Rewards and Linear Hypotheses

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We consider the design and analysis of algorithms that learn from the consequences of their actions with the goal of maximizing their cumulative reward, when the consequence of a given action is felt immediately, and a linear function, which is unknown a priori, (approximately) relates a feature vector for each action/state pair to the (expected) associated reward. We focus on two cases, one in which a continuous-valued reward is (approximately) given by applying the unknown linear function, and another in which the probability of receiving the larger of binary-valued rewards is obtained. For these cases we provide bounds on the per-trial regret for our algorithms that go to zero as the number of trials approaches infinity. We also provide lower bounds that show that the rate of convergence is nearly optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Naoki Abe, Alan W. Biermann or Philip M. Long.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abe, N., Biermann, A. & Long, P. Reinforcement Learning with Immediate Rewards and Linear Hypotheses. Algorithmica 37, 263–293 (2003). https://doi.org/10.1007/s00453-003-1038-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-003-1038-1

Navigation