research-article

Sayer: Using Implicit Feedback to Optimize System Policies

Authors:

Mathias Lécuyer,

Mihir Nanavati,

Siddhartha Sen,

Aleksandrs Slivkins,

Amit SharmaAuthors Info & Claims

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

Pages 273 - 288

https://doi.org/10.1145/3472883.3487001

Published: 01 November 2021 Publication History

Abstract

We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited < X min, because time has a cumulative property. This feedback tells us about alternative decisions, and can be used to improve the system policy. However, leveraging implicit feedback is difficult because it tends to be one-sided or incomplete, and may depend on the outcome of the event. As a result, existing practices for using feedback, such as simply incorporating it into a data-driven model, suffer from bias.

We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning---randomized exploration and unbiased counterfactual estimators---to leverage data collected by an existing policy to estimate the performance of new candidate policies, without actually deploying those policies. Sayer uses implicit exploration and implicit data augmentation to generate implicit feedback in an unbiased form, which is then used by an implicit counterfactual estimator to evaluate and train new policies. The key idea underlying these techniques is to assign implicit probabilities to decisions that are not actually taken but whose feedback can be inferred; these probabilities are carefully calculated to ensure statistical unbiasedness. We apply Sayer to two production scenarios in Azure, and show that it can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.

Supplementary Material

MP4 File (Day2_6-1.mp4)

Presentation video

Download
325.07 MB

References

[1]

Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In NSDI, Vol. 2. 4--2.

Digital Library

[2]

Noga Alon, NicolÃ2 Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. 2013. From Bandits to Experts: A Tale of Domination and Independence. In Advances in Neural Information Processing Systems (NIPS). 1610--1618.

[3]

Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective straggler mitigation: Attack of the clones. In 10th { USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 13). 185--198.

[4]

Mihovil Bartulovic, Junchen Jiang, Sivaraman Balakrishnan, Vyas Sekar, and Bruno Sinopoli. 2017. Biases in Data-Driven Networking, and What to Do About Them. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks. ACM, 192--198.

Digital Library

[5]

Eli Bingham, Jonathan P. Chen, Martin Jankowiak, Fritz Obermeyer, Neeraj Pradhan, Theofanis Karaletsos, Rohit Singh, Paul Szerlip, Paul Horsfall, and Noah D. Goodman. 2018. Pyro: Deep Universal Probabilistic Programming. Journal of Machine Learning Research (2018).

[6]

Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research 14, 1 (2013), 3207--3260.

Digital Library

[7]

Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 208--214.

[8]

Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira. 2015. PCC: Re-architecting congestion control for consistent high performance. In Symposium on Networked Systems Design and Implementation (NSDI).

[9]

Mo Dong, Tong Meng, Doron Zarchy, Engin Arslan, Yossi Gilad, Brighten Godfrey, and Michael Schapira. 2018. PCC vivace: Online-learning congestion control. In Symposium on Networked Systems Design and Implementation (NSDI).

[10]

Miroslav Dudík, Dumitru Erhan, John Langford, and Lihong Li. 2014. Doubly robust policy evaluation and optimization. Statist. Sci. (2014), 485--511.

[11]

Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang. 2011. Efficient Optimal Learning for Contextual Bandits. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence.

Digital Library

[12]

B Efron. 1979. Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics (1979).

[13]

John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, and Kirk Olynyk. 2010. Effective Data-Race Detection for the Kernel. In OSDI, Vol. 10. 1--16.

[14]

Sally Floyd and Vern Paxson. 2001. Difficulties in simulating the Internet. IEEE/ACM Transactions on Networking (ToN) 9, 4 (2001), 392--403.

Digital Library

[15]

Silvery Fu, Saurabh Gupta, Radhika Mittal, and Sylvia Ratnasamy. 2021. On the Use of ML for Blackbox System Performance Prediction. In NSDI. 763--784.

[16]

Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47, 260 (1952), 663--685.

[17]

Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.

[18]

Junchen Jiang, Rajdeep Das, Ganesh Ananthanarayanan, Philip A Chou, Venkata Padmanabhan, Vyas Sekar, Esbjorn Dominique, Marcin Goliszewski, Dalibor Kukoleca, Renat Vafin, et al. 2016. Via: Improving internet telephony call quality using predictive relay selection. In Proceedings of the 2016 ACM SIGCOMM Conference. 286--299.

Digital Library

[19]

Junchen Jiang, Vyas Sekar, Henry Milner, Davis Shepherd, Ion Stoica, and Hui Zhang. 2016. CFA: A Practical Prediction System for Video QoE Optimization. In NSDI. 137--150.

[20]

Yurong Jiang, Lenin Ravindranath Sivalingam, Suman Nath, and Ramesh Govindan. 2016. WebPerf: Evaluating what-if scenarios for cloud-hosted web applications. In Proceedings of the 2016 ACM SIGCOMM Conference. ACM, 258--271.

Digital Library

[21]

Thorsten Joachims and Adith Swaminathan. 2016. Tutorial on Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement. http://www.cs.cornell.edu/~adith/CfactSIGIR2016/ A tutorial at SIGIR 2016.

[22]

Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2018. Selecta: Heterogeneous cloud storage configuration for data analytics. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 759--773.

[23]

Ron Kohavi and Roger Longbotham. 2015. Online Controlled Experiments and A/B Tests. In Encyclopedia of Machine Learning and Data Mining, Claude Sammut and Geoff Webb (Ed.). Springer. To appear.

[24]

Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M. Henne. 2009. Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. (2009).

[25]

S Shunmuga Krishnan and Ramesh K Sitaraman. 2013. Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Transactions on Networking 21, 6 (2013), 2001--2014.

Digital Library

[26]

Gautam Kumar, Ganesh Ananthanarayanan, Sylvia Ratnasamy, and Ion Stoica. 2016. Hold'em or fold'em?: aggregation queries under performance variations. In Proceedings of the Eleventh European Conference on Computer Systems. ACM, 7.

Digital Library

[27]

John Langford, Alexander Strehl, and Jennifer Wortman. 2008. Exploration Scavenging. In Intl. Conf. on Machine Learning (ICML).

[28]

John Langford and Tong Zhang. 2007. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits. In Advances in Neural Information Processing Systems (NIPS).

[29]

Mathias Lecuyer, Joshua Lockerman, Lamont Nelson, Siddhartha Sen, Amit Sharma, and Aleksandrs Slivkins. 2017. Harvesting Randomness to Optimize Distributed Systems. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks. ACM, 178--184.

Digital Library

[30]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment 12, 12 (2019), 2118--2130.

Digital Library

[31]

Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. ACM, 661--670.

Digital Library

[32]

Hongqiang Harry Liu, Raajay Viswanathan, Matt Calder, Aditya Akella, Ratul Mahajan, Jitendra Padhye, and Ming Zhang. 2016. Efficiently Delivering Online Services over Integrated Infrastructure. In NSDI, Vol. 1. 1.

[33]

Shie Mannor and Ohad Shamir. 2011. From Bandits to Experts: On the Value of Side-Observations. In Advances in Neural Information Processing Systems (NIPS). 684--692.

[34]

Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural adaptive video streaming with pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. ACM, 197--210.

Digital Library

[35]

Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. 2018. Optimus: an efficient dynamic resource scheduler for deep learning clusters. In Proceedings of the Thirteenth EuroSys Conference. 1--14.

Digital Library

[36]

Andrea Rotnitzky and James M Robins. 1995. Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82, 4 (1995), 805--820.

[37]

Panchapakesan C Sruthi, Sanjay Rao, and Bruno Ribeiro. 2020. Pitfalls of data-driven networking: A case study of latent causal confounders in video streaming. In Proceedings of the Workshop on Network Meets AI & ML. 42--47.

Digital Library

[38]

Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford 0001, Damien Jose, and Imed Zitouni. 2016. Off-policy evaluation for slate recommendation. CoRR (2016).

[39]

Mukarram Tariq, Amgad Zeitoun, Vytautas Valancius, Nick Feamster, and Mostafa Ammar. 2008. Answering what-if deployment and configuration questions with wise. In ACM SIGCOMM Computer Communication Review, Vol. 38. ACM, 99--110.

Digital Library

[40]

Gerald Tesauro. 2007. Reinforcement learning in autonomic computing: A manifesto and case studies. IEEE Internet Computing 11, 1 (2007).

[41]

Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data. 1009--1024.

Digital Library

[42]

Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient performance prediction for large-scale advanced analytics. In 13th { USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 363--378.

[43]

Vowpal Wabbit [n.d.]. Vowpal Wabbit (Fast Learning). http://hunch.net/~vw/.

[44]

Neeraja J Yadwadkar, Bharath Hariharan, Joseph E Gonzalez, Burton Smith, and Randy H Katz. 2017. Selecting the best vm across multiple public clouds: A data-driven performance modeling approach. In Proceedings of the 2017 Symposium on Cloud Computing. 452--465.

Digital Library

[45]

Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In ACM SIGCOMM Computer Communication Review, Vol. 45. ACM, 325--338.

Digital Library

[46]

Matei Zaharia, Andy Konwinski, Anthony D Joseph, Randy H Katz, and Ion Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Osdi, Vol. 8. 7.

Digital Library

[47]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 415--432.

Digital Library

[48]

Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In Proceedings of the 2017 Symposium on Cloud Computing. 338--350.

Digital Library

Cited By

Recommendations

A posteriori compliance control
SACMAT '07: Proceedings of the 12th ACM symposium on Access control models and technologies

While preventative policy enforcement mechanisms can provide theoretical guarantees that policy is correctly enforced, they have limitations in practice. They are inflexible when unanticipated circumstances arise, and most are either inflexible with ...
PoCo: A Language for Specifying Obligation-Based Policy Compositions
ICSCA '20: Proceedings of the 2020 9th International Conference on Software and Computer Applications

Existing security-policy-specification languages allow users to specify obligations, but challenges remain in the composition of complex obligations, including effective approaches for resolving conflicts between policies and obligations and allowing ...
Conflicts in Policy-Based Distributed Systems Management

Modern distributed systems contain a large number of objects and must be capable of evolving, without shutting down the complete system, to cater for changing requirements. There is a need for distributed, automated management agents whose behavior also ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

November 2021

685 pages

ISBN:9781450386388

DOI:10.1145/3472883

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

SoCC '21

Sponsor:

SoCC '21: ACM Symposium on Cloud Computing

November 1 - 4, 2021

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
142
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten