Multi-armed bandits in the wild: Pitfalls and strategies in online experiments

https://doi.org/10.1016/j.infsof.2019.05.004Get rights and content

Abstract

Context

Delivering faster value to customers with online experimentation is an emerging practice in industry. Multi-Armed Bandit (MAB) based experiments have the potential to deliver even faster results with a better allocation of resources over traditional A/B experiments. However, the incorrect use of MAB-based experiments can lead to incorrect conclusions that can potentially hurt the company's business.

Objective

The objective of this study is to understand the pitfalls and restrictions of using MABs in online experiments, as well as the strategies that are used to overcome them.

Method

This research uses a multiple case study method with eleven experts across five software companies and simulations to triangulate the data of some of the identified limitations.

Results

This study analyzes some limitations faced by companies using MAB and discusses strategies used to overcome them. The results are summarized into practitioners’ guidelines with criteria to select an appropriated experimental design.

Conclusion

MAB algorithms have the potential to deliver even faster results with a better allocation of resources over traditional A/B experiments. However, potential mistakes can occur and hinder the potential benefits of such approach. Together with the provided guidelines, we aim for this paper to be used as reference material for practitioners during the design of an online experiment.

Introduction

Delivering faster value to customers with online experimentation is an emerging practice in industry [1], [2], [3]. Web-facing software companies (such as Microsoft, Google, Netflix, Booking.com, Yelp, and Amazon, among others) often report on success cases and the competitive advantage of using post-deployment data together with online controlled experiments as an integral part of their development methodologies [2], [4], [5], [6], [7], [8], [9], [10], [11]. This competitive advantage leads companies to start experimenting with almost every change made in their systems, from developing new functionality to the fine-tuning their systems. This intensive use is leading companies to deploy thousands of experiments every year [11], [12], [13]. A famous example of an online experiment is the ‘50 shades of blue’ experiment at Google. In this experiment, Google's engineers ran an experiment to determine the best shade of blue for a hyperlink on Google's search page. The best shade of blue resulted in an additional 200 million dollars in revenue [14], [15].

To support the diversity and the scale of experiments, software companies and academic researchers are developing innovative solutions in automating experiments, scaling the experimentation infrastructure, and in developing new algorithms and experimental designs [6], [11], [12], [16], [17], [18]. One emerging class of algorithms known as Multi-Armed Bandit (MAB) [19], [20], is being widely explored in the context of online experiments, having the potential to deliver faster results with better allocation of resources [16] compared to traditional experiments, such as A/B testing. However, the incorrect use of MAB-based experiments can lead to misinterpretations and wrong conclusions that can potentially hurt the company's business.

To the best of the authors’ knowledge, there is no work that discusses the limitations of MAB-based experiments. This work attempts to address this gap from the industry perspective using a combination of a multiple case study with simulations. This study provides analyzes some limitations faced by companies using MAB and discusses strategies used to overcome them. The results are summarized into practitioners’ guidelines with criteria to select an appropriated experimental design.

The remainder of the paper is organized as follows. Section 2 provides a background review of the MAB problem and algorithms, controlled experiments and A/B testing and the experimentation processes. Section 3 discusses the research method and threats to validity. Section 4 presents and discusses the restrictions associated with MAB implementations for online experiments. Section 5 presents a discussion of the results, use cases where MAB algorithms are desired and a guideline process to select between traditional experimentation techniques such as A/B experiments and MABs. Section 6 concludes and discusses related research challenges.

Section snippets

Background

In this section, we consider the different aspects of running online experiments. We describe a traditional online experiment in the form of an A/B test and discuss some of the limitations of this method. Next, we present the MAB class of problems and discuss some of the advantages of MAB. In the appendix, we present the MAB algorithms used in the simulations.

Research method

In earlier discussions with practitioners, we identified that, although academic research suggests that MAB algorithms provided several benefits, companies were not using MAB extensively in practice. Some of these companies suggested that these algorithms did not provide the expected benefits and that they even showed several limitations. Based on these observations, we designed this study to identify what are the restrictions and pitfalls of MAB-based experiments from the point of view of

Results

This section discusses the results obtained from the collected empirical data from the interviews and from the simulations.

Discussion

Feature experiments, powered by MABs, can provide a competitive edge for organizations, but only when skillfully applied. Several potential pitfalls can hinder the benefits of using MABs. For example, popular experimental models, such as the HYPEX [22] or the RIGHT [23], may not be well-aligned with MAB-based experiments. In particular, these models often assume that the experimental process should minimize type I errors (false positives) instead of minimizing regret (opportunity cost of

Conclusion

Delivering faster value to customers with online experiments is an emerging practice in industry. MAB algorithms have the potential to deliver even faster results with a better allocation of resources over traditional A/B experiments. This work describes common models, paradigms, and algorithms for MAB-based feature experiments currently used industry. Based on a study with 11 experts across 5 companies, we identified potential mistakes that can occur when designing a feature experiment and

Acknowledgments

This work was partially supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The authors also thank the companies and the interviewees involved in this study for the opportunity to conduct this study with them. Finally, the authors gratefully acknowledge anonymous reviewers, whose comments significantly improved this paper.

References (53)

  • H. Gui et al.

    Network A/B testing

  • X. He et al.

    Practical lessons from predicting clicks on ads at Facebook

  • A. Fabijan et al.

    The evolution of continuous experimentation in software product development

  • Y. Xu, W. Duan, and S. Huang, SQR: balancing speed, quality and risk in online experiments, no. 1, pp. 1–9,...
  • K. Kevic et al.

    Characterizing experimentation in continuous deployment: a case study on Bing

  • R.L. Kaufman, J. Pitchforth, L. Vermeer, Democratizing online controlled experiments at Booking.com, 23-Oct-2017....
  • R. Kohavi et al.

    Online controlled experiments at large scale

  • A. Hern

    Why Google has 200 m reasons to put engineers over designers

    Guardian

    (2014)
  • L. Bottou et al.

    Counterfactual reasoning and learning systems

    J. Mach. Learn. Res.

    (2013)
  • G. Burtini, J. Loeppky, R. Lawrence, A Survey of online experiment design with the stochastic multi-armed bandit,...
  • P. Dmitriev et al.

    Measuring metrics

  • S.L. Scott

    Multi-armed bandit experiments in the online service economy

    Appl. Stoch. Model. Bus. Ind.

    (2015)
  • R.S. Sutton et al.

    Sutton & Barto Book: Reinforcement Learning: An Introduction

    (1998)
  • D.C. Montgomery

    Design and Analysis of Experiments

    (2012)
  • H.H. Olsson et al.

    The HYPEX Model: from opinions to data-driven software development

    Contin. Softw. Eng.

    (2014)
  • F. Fagerholm et al.

    The RIGHT model for continuous experimentation

    J. Syst. Softw.

    (2017)
  • Cited by (14)

    • An empirical evaluation of active inference in multi-armed bandits

      2021, Neural Networks
      Citation Excerpt :

      Importantly, in bandit problems there is no need to plan ahead because available choices and rewards in the next run are not affected by current choices2. The lack of need for planning simplifies the problem substantially and puts a focus on the exploration–exploitation trade-off, making the bandit problem a standard test-bed for any algorithm that purports to address the trade-off (Mattos, Bosch, & Olsson, 2019). Bandit problems were theoretically developed largely in statistics and machine learning, usually focusing on the canonical stationary bandit problem (Auer et al., 2002; Kaufmann et al., 2012a, 2012b; Lai & Robbins, 1985; Lattimore & Szepesvári, 2020; Slivkins et al., 2019).

    • Big Data analytics in Agile software development: A systematic mapping study

      2021, Information and Software Technology
      Citation Excerpt :

      Furthermore, ASD is often referred to as iterative software development in papers (e.g. [S20]). In addition, several concepts such as regression testing, automatic app review classification (e.g. [S17,S36,S54]) or continuous integration, although popular in ASD, are not directly linked to Agile practices in papers describing them. For that reason, it was not always clear if a particular work discussed only plan-driven software development approaches or rather focused on modern Agile development practices, where incremental and iterative development plays a vital role.

    • Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

      2023, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    View all citing articles on Scopus
    View full text