A formal proof of the 𝜖-optimality of discretized pursuit algorithms

Zhang, Xuan; Oommen, B. John; Granmo, Ole-Christoffer; Jiao, Lei

doi:10.1007/s10489-015-0670-1

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

Published: 07 May 2015

Volume 44, pages 282–294, (2016)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xuan Zhang¹,
B. John Oommen^1,2,
Ole-Christoffer Granmo¹ &
…
Lei Jiao¹

241 Accesses
8 Citations
Explore all metrics

Abstract

Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for 𝜖-optimality for all the reported algorithms for the past three decades have been flawed. We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm (CPA). The latter proof examines the monotonicity property of the probability of selecting the optimal action, and requires the learning parameter to be continuously changing. In this paper, we provide a new method to prove the 𝜖-optimality of the Discretized Pursuit Algorithm (DPA) which does not require this constraint, by virtue of the fact that the DPA has, in and of itself, absorbing barriers to which the LA can jump in a discretized manner. Unlike the proof given (Zhang et al., Appl Intell 41:974–985, 3) for an absorbing version of the CPA, which utilizes the single-action Hoeffding’s inequality, the current proof invokes what we shall refer to as the “multi-action” version of the Hoeffding’s inequality. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the 𝜖-optimality of the other EAs that possess absorbing states.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

Article 25 January 2016

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

The Hierarchical Continuous Pursuit Learning Automation for Large Numbers of Actions

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Notes

PAs have also been extended by allowing them to be of the Reward-Penalty paradigms [13]. We do not consider these here.
In addition, like the proofs for the asymptotic convergence, the finite time analysis of both the CPA and the DPA were also done in [2]. Unfortunately, these analyses are also flawed inasmuch as they are also based on the reasoning of the above-mentioned “monotonicity” assumption of the probability of selecting the optimal action.
We state the conditions and parameters for the CPA, while the analogous counterpart conditions and parameters for the DPA are stated in parenthesis to avoid repetition. The reader should observe that we, really, did not have to mention the conditions and parameters for the CPA. But we have opted to show it because the proof given in [25], which demonstrated the flaw, is based on the CPA, and we believe that this will improve the readability of the present paper.
In the interest of simplicity, at this juncture we have assumed in (4) that the $\hat {d}_{j}$’s are independent of each other. We believe that this assumption can be easily relaxed by considering only the individual d _j’s and not all of them together.
If one is interested in pursuing the general r-action scenario in greater detail without invoking the 2-action results, the arguments involved are almost identical, except that the algebra is a little more cumbersome. Without going into the detailed algebraic manipulations, we can submit the arguments as follows. For the specific Environment, we define:
$$\begin{array}{@{}rcl@{}} H_{j} &=& d_{m}-d_{j}, j \neq m,\\ \hat{H}_{j}(t) &=& \hat{d}_{m}(t)-\hat{d}_{j}(t), j \neq m. \end{array} $$
Then, given any δ ∈ (0, 1), if we denote $\delta ^{\star } =1-\sqrt [r-1]{1-\delta }$, we can show that there exists a time instant t ₀, such that within the time defined by t ₀, α _m has been selected more than $\left \lceil \frac {-\ln {\delta ^{\star }}}{(min\{H_{j}\})^{2}} \right \rceil $ times, and α _j,(j≠m) has been selected more than $\left \lceil \frac {-\ln {\delta ^{\star }}}{{H_{j}^{2}}} \right \rceil $ times. Consequently, for ∀t > t ₀, q _j(t) > 1 − δ ^⋆ and $q(t) \geq \prod \limits _{j=1...r, j \neq m}{q_{j}(t)}>1-\delta $.

References

Zhang X, Oommen BJ, Granmo O-C, Jiao L (2014) Using the theory of regular functions to formally prove the 𝜖-optimality of discretized pursuit learning algorithms. In: Proceedings of IEA-AIE. Springer, Kaohsiung, Taiwan, pp 379–388
Rajaraman K, Sastry PS (1996) Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans Syst Man Cybern B: Cybern 26:590–598
Article Google Scholar
Zhang X, Granmo O-C, Oommen BJ, Jiao L (2014) A formal proof of the 𝜖-optimality of absorbing continuous pursuit algorithms using the theory of regular functions. Appl Intell 41:974–985
Article Google Scholar
Narendra KS, Thathachar MAL (1989) Learning automata: an introduction. Prentice Hall
Oommen BJ (1986) Absorbing and ergodic discretized two-action learning automata. IEEE Trans Syst Man Cybern 16:282–296
Article MathSciNet MATH Google Scholar
Thathachar MAL, Sastry PS (1986) Estimator algorithms for learning automata. In: Proceedings of the Platinum Jubilee Conference on Systems and Signal Processing, Bangalore, India, pp 29–32
Agache M, Oommen BJ (2002) Generalized pursuit learning schemes: new families of continuous and discretized learning automata. IEEE Trans Syst Man Cybern B: Cybern 32(6):738–749
Article Google Scholar
Zhang X, Granmo O-C, Oommen BJ (2011) The Bayesian pursuit algorithm: A new family of estimator learning automata. In: Proceedings of IEA-AIE 2011. Springer, New York, USA, pp 608–620
Zhang X, Granmo O-C, Oommen BJ (2013) On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata. Appl Intell 39:782–792
Article Google Scholar
Oommen BJ, Lanctôt JK (1990) Discretized pursuit learning automata. IEEE Trans Syst Man Cybern 20:931–938
Article MATH Google Scholar
Lanctôt JK, Oommen BJ (1991) On discretizing estimator-based learning algorithms. IEEE Trans Syst Man Cybern B: Cybern 2:1417–1422
Google Scholar
Lanctôt JK, Oommen BJ (1992) Discretized estimator learning automata. IEEE Trans Syst Man Cybern B: Cybern 22(6):1473–1483
Article Google Scholar
Oommen BJ, Agache M (2001) Continuous and discretized pursuit learning schemes: various algorithms and their comparison. IEEE Trans Syst Man Cybern B: Cyber 31(3):277–287
Article Google Scholar
Zhang X, Granmo O-C, Oommen BJ (2012) “Discretized Bayesian pursuit - a new scheme for reinforcement learning. In: Proceedings of IEA-AIE 2012, Dalian, China, pp 784–793
Oommen BJ, Granmo O-C, Pedersen A (2007) Using stochastic AI techniques to achieve unbounded resolution in finite player Goore Games and its applications. In: Proceedings of IEEE Symposium on Computational Intelligence and Games, Honolulu, HI, pp 161–167
Beigy H, Meybodi MR (2000) Adaptation of parameters of BP algorithm using learning automata. In: Proceedings of Sixth Brazilian Symposium on Neural Networks, JR, Brazil, pp 24–31
Granmo O-C, Oommen BJ, Myrer S-A, Olsen MG (2007) Learning automata-based solutions to the nonlinear fractional knapsack problem with applications to optimal resource allocation. IEEE Trans Syst Man Cybern B 37(1):166–175
Article Google Scholar
Unsal C, Kachroo P, Bay JS (1999) Multiple stochastic learning automata for vehicle path control in an automated highway system. IEEE Trans Syst Man Cybern A 29:120–128
Article Google Scholar
Oommen BJ, Roberts TD (2000) Continuous learning automata solutions to the capacity assignment problem. IEEE Trans Comput 49:608–620
Article Google Scholar
Granmo O-C (2010) Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Trans Comput 59(4):545–560
Article MathSciNet Google Scholar
Oommen BJ, Croix TDS (1997) String taxonomy using learning automata. IEEE Trans Syst Man Cybern 27:354–365
Article Google Scholar
Oommen BJ, de St. Croix EV (1996) Graph partitioning using learning automata. IEEE Trans Comput 45:195–208
Article MathSciNet MATH Google Scholar
Dean T, Angluin D, Basye K, Engelson S, Aelbling L, Maron O (1995) Inferring finite automata with stochastic output functions and an application to map learning. Mach Learn 18:81–08
Google Scholar
Song Y, Fang Y, Zhang Y (2007) Stochastic channel selection in cognitive radio networks. In: Proceedings of IEEE Global Telecommunications Conference, Washington DC, USA, pp 4878–4882
Ryan M, Omkar T (2012) On 𝜖-optimality of the pursuit learning algorithm. J Appl Probab 49(3):795–805
Article MathSciNet MATH Google Scholar
Zhang X, Granmo O-C, Oommen BJ, Jiao L (2013) On using the theory of regular functions to prove the 𝜖-optimality of the continuous pursuit learning automaton. In: Proceedings of IEA-AIE 2013. Springer, Amsterdan, Holland, pp 262– 271
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of ICT, University of Agder, Grimstad, Norway
Xuan Zhang, B. John Oommen, Ole-Christoffer Granmo & Lei Jiao
School of Computer Science, Carleton University, Ottawa, K1S 5B6, Canada
B. John Oommen

Authors

Xuan Zhang
View author publications
You can also search for this author inPubMed Google Scholar
B. John Oommen
View author publications
You can also search for this author inPubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author inPubMed Google Scholar
Lei Jiao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xuan Zhang.

Additional information

This work was partially supported by NSERC, the Natural Sciences and Engineering Research Council of Canada. A preliminary version of some of the results of this paper was presented at IEAAIE-2014, the 27th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Kaohsiung, Taiwan, in June 2014 [1]. This paper won the Best Paper Award at the conference.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Oommen, B.J., Granmo, OC. et al. A formal proof of the 𝜖-optimality of discretized pursuit algorithms. Appl Intell 44, 282–294 (2016). https://doi.org/10.1007/s10489-015-0670-1

Download citation

Published: 07 May 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10489-015-0670-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

The Hierarchical Continuous Pursuit Learning Automation for Large Numbers of Actions

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now