Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

Kolnogorov, A. V.; Nazin, A. V.; Shiyan, D. N.

doi:10.1134/S0005117922080100

Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm

MATHEMATICAL GAME THEORY AND APPLICATIONS
Published: 16 September 2022

Volume 83, pages 1288–1307, (2022)
Cite this article

Automation and Remote Control Aims and scope Submit manuscript

A. V. Kolnogorov¹,
A. V. Nazin² &
D. N. Shiyan¹

88 Accesses
1 Citation
Explore all metrics

Abstract

We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods with different a priori unknown efficiencies. One should determine the most efficient method and provide its predominant application. To this end, we use the mirror descent algorithm (MDA). It is well known that the corresponding minimax risk has the order of \( N^{1/2} \), where \( N \) is the amount of processed data, and this bound is order sharp. We propose a batch version of the MDA which allows processing data by packets; this is especially important if parallel data processing can be provided. In this case, the processing time is determined by the number of batches rather than the total amount of data. Unexpectedly, it has turned out that the batch version behaves unlike the ordinary one even if the number of packets is large. Moreover, the batch version provides a considerably lower minimax risk; i.e., it substantially improves the control performance. We explain this result by considering another batch modification of the MDA whose behavior is close to the behavior of the ordinary version and the minimax risk is close as well. Our estimates use invariant descriptions of the algorithms based on Gaussian approximations of income in the batches of data in the domain of “close” distributions and are obtained by Monte-Carlo simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gaussian Two-Armed Bandit: Limiting Description

Article 01 July 2020

Poissonian Two-Armed Bandit: A New Approach

Article 01 April 2022

Gaussian Two-Armed Bandit and Optimization of Batch Data Processing

Article 01 January 2018

REFERENCES

Borovkov, A.A., Matematicheskaya statistika. Dopolnitel’nye glavy: Uchebnoe posobie dlya vuzov (Mathematical Statistics. Additional Chapters: a Textbook for Universities), Moscow: Nauka, 1984.
Google Scholar
Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973.
Gasnikov, A.V., Nesterov, Yu.E., and Spokoiny, V.G., On the efficiency of a randomized mirror descent algorithm in online optimization problems, Comput. Math. Math. Phys., 2015, vol. 55, no. 4, pp. 580–596.
Article MathSciNet Google Scholar
Kolnogorov, A.V., Gaussian two-armed bandit and optimization of batch data processing, Probl. Inf. Transm., 2018, vol. 54, no. 1, pp. 84–100.
Article MathSciNet Google Scholar
Kolnogorov, A.V., Gaussian two-armed bandit: limiting description, Probl. Inf. Transm., 2020, vol. 56, no. 3, pp. 278–301.
Article MathSciNet Google Scholar
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov (Adaptive Choice of Options), Moscow: Nauka, 1986.
Google Scholar
Nemirovskii, A.S. and Yudin, D.B., Efficient methods for solving high-dimensional convex programming problems, Ekon. Mat. Metody, 1979, vol. 15, no. 1, pp. 135–152.
Google Scholar
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym (Sequential Control Based on Incomplete Data), Moscow: Nauka, 1982.
Google Scholar
Smirnov, D.S. and Gromova, E.V., Decision-making model under presence of experts as a modified multi-armed bandit problem, Mat. Teor. Igr Pril., 2017, vol. 9, no. 4, pp. 69–87.
MATH Google Scholar
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.
MATH Google Scholar
Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem (Research on Automata Theory and Modeling of Biological Systems), Moscow: Nauka, 1969.
MATH Google Scholar
Auer, P., Using confidence bounds for exploitation-exploration trade-offs, J. Mach. Learn. Res., 2002, vol. 3, pp. 397–422.
MathSciNet MATH Google Scholar
Auer, P., Cesa-Bianchi, N., and Fischer, P., Finite-time analysis of the multi-armed bandit problem, Mach. Learn., 2002, vol. 47, no. 2–3, pp. 235–256.
Article Google Scholar
Bather, J.A., The minimax risk for the two-armed bandit problem, in Mathematical Learning Models—Theory and Algorithms, Lect. Notes Stat., New York: Springer-Verlag, 1983, vol. 20, pp. 1–11.
Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London–New York: Chapman and Hall, 1985.
Book Google Scholar
Fabius, J. and van Zwet, W.R., Some remarks on the two-armed bandit, Ann. Math. Stat., 1970, vol. 41, pp. 1906–1916.
Article MathSciNet Google Scholar
Juditsky, A., Nazin, A.V., Tsybakov, A.B., and Vayatis, N., Gap-free bounds for stochastic multi-armed bandit, Proc. 17th World Congr. IFAC (Seoul, Korea, July 6–11, 2008), pp. 11560–11563.
Kaufmann, E., On Bayesian index policies for sequential resource allocation, Ann. Stat., 2018, vol. 46, no. 2, pp. 842–865.
Article MathSciNet Google Scholar
Lai, T.L., Levin, B., Robbins, H., and Siegmund, D., Sequential medical trials (stopping rules/asymptotic optimality), Proc. Natl. Acad. Sci. USA, 1980, vol. 77, no. 6, pp. 3135–3138.
Article Google Scholar
Lattimore, T. and Szepesvari, C., Bandit Algorithms, Cambridge: Cambridge Univ. Press, 2020.
Book Google Scholar
Lai, T.L. and Robbins, H., Asymptotically efficient adaptive allocation rules, Adv. Appl. Math., 1985, vol. 6, pp. 4–22.
Article MathSciNet Google Scholar
Lugosi, G. and Cesa-Bianchi, N., Prediction, Learning and Games, Cambridge: Cambridge Univ. Press, 2006.
Robbins, H., Some aspects of the sequential design of experiments, Bull. AMS, 1952, vol. 58, no. 5, pp. 527–535.
Article MathSciNet Google Scholar
Vogel, W., An asymptotic minimax theorem for the two-armed bandit problem, Ann. Math. Stat., 1960, vol. 31, pp. 444–451.
Article MathSciNet Google Scholar

Download references

Funding

The research by A.V. Nazin was supported financially by the Russian Science Foundation, project no. 16-11-10015. The research by A.V. Kolnogorov and D.N. Shiyan was supported financially by the Russian Foundation for Basic Research, project no. 20-01-00062.

Author information

Authors and Affiliations

Yaroslav-the-Wise Novgorod State University, Novgorod, 173003, Russia
A. V. Kolnogorov & D. N. Shiyan
Trapeznikov Institute of Control Sciences, Russian Academy of Sciences, Moscow, 117997, Russia
A. V. Nazin

Authors

A. V. Kolnogorov
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Nazin
View author publications
You can also search for this author in PubMed Google Scholar
D. N. Shiyan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. V. Kolnogorov, A. V. Nazin or D. N. Shiyan.

Additional information

Translated by V. Potapchouck

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolnogorov, A.V., Nazin, A.V. & Shiyan, D.N. Two-Armed Bandit Problem and Batch Version of the Mirror Descent Algorithm. Autom Remote Control 83, 1288–1307 (2022). https://doi.org/10.1134/S0005117922080100

Download citation

Received: 20 November 2020
Revised: 08 December 2020
Accepted: 01 March 2021
Published: 16 September 2022
Issue Date: August 2022
DOI: https://doi.org/10.1134/S0005117922080100

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions