research-article

Online Safety Assurance for Learning-Augmented Systems

Authors:

Noga H. Rotman,

Michael Schapira,

Aviv TamarAuthors Info & Claims

HotNets '20: Proceedings of the 19th ACM Workshop on Hot Topics in Networks

Pages 88 - 95

https://doi.org/10.1145/3422604.3425940

Published: 04 November 2020 Publication History

Abstract

Recently, deep learning has been successfully applied to a variety of networking problems. A fundamental challenge is that when the operational environment for a learning-augmented system differs from its training environment, such systems often make badly informed decisions, leading to bad performance. We argue that safely deploying learning-driven systems requires being able to determine, in real-time, whether system behavior is coherent, for the purpose of defaulting to a reasonable heuristic when this is not so. We term this the online safety assurance problem (OSAP). We present three approaches to quantifying decision uncertainty that differ in terms of the signal used to infer uncertainty. We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming. While deep RL for video streaming bests other approaches when the operational and training environments match, it is dominated by simple heuristics when the two differ. Our preliminary findings suggest that transitioning to a default policy when decision uncertainty is detected is key to enjoying the performance benefits afforded by leveraging ML without compromising on safety.

References

[1]

Dash industry form. http://mediapm.edgesuite.net/dash/public/nightly/samples/dash-if-reference-player/index.html, 2016. Accessed: 2020.

[2]

Z. Akhtar, Y. S. Nam, R. Govindan, S. Rao, J. Chen, E. Katz-Bassett, B. Ribeiro, J. Zhan, and H. Zhang. Oboe: auto-tuning video abr algorithms to network conditions. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 44--58, 2018.

Digital Library

[3]

J. An and S. Cho. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2(1), 2015.

[4]

M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.

[5]

K. J. Åström and B. Wittenmark. Adaptive control. Courier Corporation, 2013.

[6]

T. Auld, A. W. Moore, and S. F. Gull. Bayesian neural networks for internet traffic classification. IEEE Transactions on neural networks, 18(1):223--239, 2007.

Digital Library

[7]

M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems, pages 1471--1479, 2016.

Digital Library

[8]

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137--144, 2007.

Digital Library

[9]

D. P. Bertsekas. Dynamic programming and optimal control, volume 1. Athena scientific Belmont, MA, 1995.

Digital Library

[10]

Y. Burda, H. Edwards, A. Storkey, and O. Klimov. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.

[11]

R. Chalapathy and S. Chawla. Deep learning for anomaly detection: A survey. CoRR, abs/1901.03407, 2019.

[12]

L. Chen, J. Lingys, K. Chen, and F. Liu. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 191--205, 2018.

Digital Library

[13]

S. Fujimoto, D. Meger, and D. Precup. Off-policy deep reinforcement learning without exploration. arXiv preprint arXiv:1812.02900, 2018.

[14]

M. Ghavamzadeh, M. Petrik, and Y. Chow. Safe policy improvement by minimizing robust baseline regret. In Advances in Neural Information Processing Systems, pages 2298--2306, 2016.

[15]

C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On calibration of modern neural networks. arXiv preprint arXiv:1706.04599, 2017.

[16]

J. P. Hanna, P. Stone, and S. Niekum. Bootstrapping with models: Confidence intervals for off-policy evaluation. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.

[17]

R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel. Vime: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pages 1109--1117, 2016.

Digital Library

[18]

S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017.

[19]

T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson. A buffer-based approach to rate adaptation: Evidence from a large video streaming service. In Proceedings of the 2014 ACM conference on SIGCOMM, pages 187--198, 2014.

Digital Library

[20]

N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar. A deep reinforcement learning perspective on internet congestion control. In International Conference on Machine Learning, pages 3050--3059, 2019.

[21]

A. Juliani, A. Khalifa, V.-P. Berges, J. Harper, E. Teng, H. Henry, A. Crespi, J. Togelius, and D. Lange. Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378, 2019.

[22]

V. Kirilin, A. Sundarrajan, S. Gorinsky, and R. K. Sitaraman. Rl-cache: Learning-based cache admission for content delivery. IEEE Journal on Selected Areas in Communications, 2020.

[23]

S. Kullback and R. A. Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79--86, 1951.

[24]

A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine. Stabilizing off-policy q-learning via bootstrapping error reduction. In Advances in Neural Information Processing Systems, pages 11761--11771, 2019.

[25]

S. Levine, A. Kumar, G. Tucker, and J. Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.

[26]

E. Liang, H. Zhu, X. Jin, and I. Stoica. Neural packet classification. In Proceedings of the ACM Special Interest Group on Data Communication, pages 256--269. 2019.

Digital Library

[27]

H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pages 197--210, 2017.

Digital Library

[28]

H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication, pages 270--288. 2019.

Digital Library

[29]

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928--1937, 2016.

Digital Library

[30]

R. Netravali, A. Sivaraman, S. Das, A. Goyal, K. Winstein, J. Mickens, and H. Balakrishnan. Mahimahi: Accurate record-and-replay for http. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 417--429, 2015.

[31]

A. Nichol, V. Pfau, C. Hesse, O. Klimov, and J. Schulman. Gotta learn fast: A new benchmark for generalization in rl. arXiv preprint arXiv:1804.03720, 2018.

[32]

X. Nie, Y. Zhao, Z. Li, G. Chen, K. Sui, J. Zhang, Z. Ye, and D. Pei. Dynamic tcp initial windows and congestion control schemes through reinforcement learning. IEEE Journal on Selected Areas in Communications, 37(6):1231--1247, 2019.

[33]

I. Osband, J. Aslanides, and A. Cassirer. Randomized prior functions for deep reinforcement learning. In Advances in Neural Information Processing Systems, pages 8617--8629, 2018.

[34]

I. Osband, C. Blundell, A. Pritzel, and B. Van Roy. Deep exploration via bootstrapped dqn. In Advances in neural information processing systems, pages 4026--4034, 2016.

Digital Library

[35]

P.-Y. Oudeyer, F. Kaplan, and V. V. Hafner. Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265--286, 2007.

Digital Library

[36]

S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345--1359, 2009.

Digital Library

[37]

D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 16--17, 2017.

[38]

A. Pattanaik, Z. Tang, S. Liu, G. Bommannan, and G. Chowdhary. Robust deep reinforcement learning with adversarial attacks. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2040--2042. International Foundation for Autonomous Agents and Multiagent Systems, 2018.

Digital Library

[39]

C. E. Rasmussen. Gaussian processes in machine learning. In Summer School on Machine Learning, pages 63--71. Springer, 2003.

[40]

H. Riiser, P. Vigmostad, C. Griwodz, and P. Halvorsen. Commute path bandwidth traces from 3g networks: analysis and applications. In Proceedings of the 4th ACM Multimedia Systems Conference, pages 114--118, 2013.

Digital Library

[41]

E. Sarafian, A. Tamar, and S. Kraus. Safe policy learning from observations. arXiv preprint arXiv:1805.07805, 2018.

[42]

T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, pages 146--157. Springer, 2017.

[43]

J. Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990--2010). IEEE Transactions on Autonomous Mental Development, 2(3):230--247, 2010.

Digital Library

[44]

B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Comput., 13(7):1443--1471, July 2001.

Digital Library

[45]

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897, 2015.

Digital Library

[46]

G. A. Seber and A. J. Lee. Linear regression analysis, volume 329. John Wiley & Sons, 2012.

[47]

R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak. Planning to explore via self-supervised world models. arXiv preprint arXiv:2005.05960, 2020.

[48]

Y. Song and Z. Ou. Learning neural random fields with inclusive auxiliary generators. ArXiv, abs/1806.00271, 2018.

[49]

Y. Sun, X. Yin, J. Jiang, V. Sekar, F. Lin, N. Wang, T. Liu, and B. Sinopoli. Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction. In Proceedings of the 2016 ACM SIGCOMM Conference, pages 272--285, 2016.

Digital Library

[50]

R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

Digital Library

[51]

A. Tamar, Y. Wu, G. Thomas, S. Levine, and P. Abbeel. Value iteration networks. In Advances in Neural Information Processing Systems, pages 2154--2162, 2016.

Digital Library

[52]

D. M. J. Tax and R. P. W. Duin. Support vector data description. Mach. Learn., 54(1):45--66, Jan. 2004.

Digital Library

[53]

M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(Jul):1633--1685, 2009.

[54]

P. Thomas and E. Brunskill. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pages 2139--2148, 2016.

Digital Library

[55]

P. Thomas, G. Theocharous, and M. Ghavamzadeh. High confidence policy improvement. In International Conference on Machine Learning, pages 2380--2388, 2015.

Digital Library

[56]

P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High-confidence off-policy evaluation. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

Digital Library

[57]

A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar. Learning to route. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks, pages 185--191, 2017.

Digital Library

[58]

J. van der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, T. Bostoen, and F. De Turck. HTTP/2-Based Adaptive Streaming of HEVC Video Over 4G/LTE Networks. IEEE Communications Letters, 20(11):2177--2180, 2016.

[59]

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey,. I. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and S... Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261--272, 2020.

[60]

W. Wang, A. Wang, A. Tamar, X. Chen, and P. Abbeel. Safer classification by synthesis. CoRR, abs/1711.08534, 2017.

[61]

F. Y. Yan, H. Ayers, C. Zhu, S. Fouladi, J. Hong, K. Zhang, P. Levis, and K. Winstein. Learning in situ: a randomized experiment in video streaming. In 17th {USENIX} Symposium on Networked Systems Design and Implementation ($$NSDI$$ 20), pages 495--511, 2020.

[62]

F. Y. Yan, J. Ma, G. Hill, D. Raghavan, R. S. Wahby, P. Levis, and K. Winstein. Pantheon: the training ground for internet congestion-control research. Measurement at http://pantheon. stanford. edu/result/1622, 2018.

[63]

X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. A control-theoretic approach for dynamic adaptive video streaming over http. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, pages 325--338, 2015.

Digital Library

[64]

C. Zhang, O. Vinyals, R. Munos, and S. Bengio. A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893, 2018.

Cited By

Yu HZheng JDu ZLiu YQuan BChen TChen G(2024)When Classic Meets Intelligence: A Hybrid Multipath Congestion Control FrameworkIEEE/ACM Transactions on Networking10.1109/TNET.2024.339535632:4(3575-3590)Online publication date: Aug-2024
https://doi.org/10.1109/TNET.2024.3395356
Zisselman ELavie ISoudry DTamar AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Explore to generalize in zero-shot RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668880(63174-63196)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668880
Ognibene DWilkens RTaibi DHernández-Leo DKruschwitz UDonabauer GTheophilou ELomonaco FBursic SLobo RSánchez-Reina JScifo LSchwarze VBörsting JHoppe UAprin FMalzahn NEimler S(2023)Challenging social media threats using collective well-being-aware recommendation algorithms and an educational virtual companionFrontiers in Artificial Intelligence10.3389/frai.2022.6549305Online publication date: 9-Jan-2023
https://doi.org/10.3389/frai.2022.654930
Show More Cited By

Index Terms

Online Safety Assurance for Learning-Augmented Systems
1. Networks
  1. Network protocols
    1. Network protocol design
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Recommendations

Can you trust your Agent? The Effect of Out-of-Distribution Detection on the Safety of Reinforcement Learning Systems
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing

Deep Reinforcement Learning (RL) has the potential to revolutionize the automation of complex sequential decision-making problems. Although it has been successfully applied to a wide range of tasks, deployment to real-world settings remains challenging ...
Online learning: A comprehensive survey
Abstract
Online learning represents a family of machine learning methods, where a learner attempts to tackle some predictive (or any type of decision-making) task by learning from a sequence of data instances one by one at each ...
Utilising Assured Multi-Agent Reinforcement Learning within Safety-Critical Scenarios
Abstract
Multi-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HotNets '20: Proceedings of the 19th ACM Workshop on Hot Topics in Networks

November 2020

228 pages

ISBN:9781450381451

DOI:10.1145/3422604

General Chairs:
Ben Zhao
University of Chicago
,
Heather Zheng
University of Chicago
,
Program Chairs:
Harsha V. Madhyastha
University of Michigan
,
Venkat Padmanabhan
Microsoft Research India

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Israel Science Foundation
Israel Science Foundation (ISF)
NSF-BSF grant

Conference

HotNets '20

Sponsor:

SIGCOMM

HotNets '20: The 19th ACM Workshop on Hot Topics in Networks

November 4 - 6, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 110 of 460 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
389
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yu HZheng JDu ZLiu YQuan BChen TChen G(2024)When Classic Meets Intelligence: A Hybrid Multipath Congestion Control FrameworkIEEE/ACM Transactions on Networking10.1109/TNET.2024.339535632:4(3575-3590)Online publication date: Aug-2024
https://doi.org/10.1109/TNET.2024.3395356
Zisselman ELavie ISoudry DTamar AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Explore to generalize in zero-shot RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668880(63174-63196)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668880
Ognibene DWilkens RTaibi DHernández-Leo DKruschwitz UDonabauer GTheophilou ELomonaco FBursic SLobo RSánchez-Reina JScifo LSchwarze VBörsting JHoppe UAprin FMalzahn NEimler S(2023)Challenging social media threats using collective well-being-aware recommendation algorithms and an educational virtual companionFrontiers in Artificial Intelligence10.3389/frai.2022.6549305Online publication date: 9-Jan-2023
https://doi.org/10.3389/frai.2022.654930
Patel SAbdu Jyothi SNarodytska N(2023)Towards Future-Based Explanations for Deep RL Network ControllersACM SIGMETRICS Performance Evaluation Review10.1145/3626570.362660751:2(100-102)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3626570.3626607
Rotman N(2023)Tackling Deployability Challenges in ML-Powered NetworksACM SIGMETRICS Performance Evaluation Review10.1145/3626570.362660551:2(94-96)Online publication date: 2-Oct-2023
https://dl.acm.org/doi/10.1145/3626570.3626605
Benson TYoneki ENardi L(2023)Illuminating the hidden challenges of data-driven CDNsProceedings of the 3rd Workshop on Machine Learning and Systems10.1145/3578356.3592574(94-103)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3578356.3592574
Ji XHan BXu CSong CSu J(2023)Adaptive QoS-aware multipath congestion control for live streamingComputer Networks10.1016/j.comnet.2022.109470220(109470)Online publication date: Jan-2023
https://doi.org/10.1016/j.comnet.2022.109470
Ji XHan BLi RXu CLi YSu J(2022)ACCeSS: Adaptive QoS-aware Congestion Control for Multipath TCP2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)10.1109/IWQoS54832.2022.9812886(1-10)Online publication date: 10-Jun-2022
https://doi.org/10.1109/IWQoS54832.2022.9812886
Du ZZheng JYu HKong LChen GCarle GOtt J(2021)A unified congestion control framework for diverse application preferences and network conditionsProceedings of the 17th International Conference on emerging Networking EXperiments and Technologies10.1145/3485983.3494840(282-296)Online publication date: 2-Dec-2021
https://dl.acm.org/doi/10.1145/3485983.3494840

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents