skip to main content
10.1145/3422604.3425940acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Online Safety Assurance for Learning-Augmented Systems

Published: 04 November 2020 Publication History

Abstract

Recently, deep learning has been successfully applied to a variety of networking problems. A fundamental challenge is that when the operational environment for a learning-augmented system differs from its training environment, such systems often make badly informed decisions, leading to bad performance. We argue that safely deploying learning-driven systems requires being able to determine, in real-time, whether system behavior is coherent, for the purpose of defaulting to a reasonable heuristic when this is not so. We term this the online safety assurance problem (OSAP). We present three approaches to quantifying decision uncertainty that differ in terms of the signal used to infer uncertainty. We illustrate the usefulness of online safety assurance in the context of the proposed deep reinforcement learning (RL) approach to video streaming. While deep RL for video streaming bests other approaches when the operational and training environments match, it is dominated by simple heuristics when the two differ. Our preliminary findings suggest that transitioning to a default policy when decision uncertainty is detected is key to enjoying the performance benefits afforded by leveraging ML without compromising on safety.

References

[1]
Dash industry form. http://mediapm.edgesuite.net/dash/public/nightly/samples/dash-if-reference-player/index.html, 2016. Accessed: 2020.
[2]
Z. Akhtar, Y. S. Nam, R. Govindan, S. Rao, J. Chen, E. Katz-Bassett, B. Ribeiro, J. Zhan, and H. Zhang. Oboe: auto-tuning video abr algorithms to network conditions. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 44--58, 2018.
[3]
J. An and S. Cho. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2(1), 2015.
[4]
M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
[5]
K. J. Åström and B. Wittenmark. Adaptive control. Courier Corporation, 2013.
[6]
T. Auld, A. W. Moore, and S. F. Gull. Bayesian neural networks for internet traffic classification. IEEE Transactions on neural networks, 18(1):223--239, 2007.
[7]
M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos. Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems, pages 1471--1479, 2016.
[8]
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in neural information processing systems, pages 137--144, 2007.
[9]
D. P. Bertsekas. Dynamic programming and optimal control, volume 1. Athena scientific Belmont, MA, 1995.
[10]
Y. Burda, H. Edwards, A. Storkey, and O. Klimov. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
[11]
R. Chalapathy and S. Chawla. Deep learning for anomaly detection: A survey. CoRR, abs/1901.03407, 2019.
[12]
L. Chen, J. Lingys, K. Chen, and F. Liu. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 191--205, 2018.
[13]
S. Fujimoto, D. Meger, and D. Precup. Off-policy deep reinforcement learning without exploration. arXiv preprint arXiv:1812.02900, 2018.
[14]
M. Ghavamzadeh, M. Petrik, and Y. Chow. Safe policy improvement by minimizing robust baseline regret. In Advances in Neural Information Processing Systems, pages 2298--2306, 2016.
[15]
C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On calibration of modern neural networks. arXiv preprint arXiv:1706.04599, 2017.
[16]
J. P. Hanna, P. Stone, and S. Niekum. Bootstrapping with models: Confidence intervals for off-policy evaluation. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[17]
R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel. Vime: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pages 1109--1117, 2016.
[18]
S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017.
[19]
T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson. A buffer-based approach to rate adaptation: Evidence from a large video streaming service. In Proceedings of the 2014 ACM conference on SIGCOMM, pages 187--198, 2014.
[20]
N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar. A deep reinforcement learning perspective on internet congestion control. In International Conference on Machine Learning, pages 3050--3059, 2019.
[21]
A. Juliani, A. Khalifa, V.-P. Berges, J. Harper, E. Teng, H. Henry, A. Crespi, J. Togelius, and D. Lange. Obstacle tower: A generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378, 2019.
[22]
V. Kirilin, A. Sundarrajan, S. Gorinsky, and R. K. Sitaraman. Rl-cache: Learning-based cache admission for content delivery. IEEE Journal on Selected Areas in Communications, 2020.
[23]
S. Kullback and R. A. Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79--86, 1951.
[24]
A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine. Stabilizing off-policy q-learning via bootstrapping error reduction. In Advances in Neural Information Processing Systems, pages 11761--11771, 2019.
[25]
S. Levine, A. Kumar, G. Tucker, and J. Fu. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
[26]
E. Liang, H. Zhu, X. Jin, and I. Stoica. Neural packet classification. In Proceedings of the ACM Special Interest Group on Data Communication, pages 256--269. 2019.
[27]
H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with pensieve. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pages 197--210, 2017.
[28]
H. Mao, M. Schwarzkopf, S. B. Venkatakrishnan, Z. Meng, and M. Alizadeh. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM Special Interest Group on Data Communication, pages 270--288. 2019.
[29]
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928--1937, 2016.
[30]
R. Netravali, A. Sivaraman, S. Das, A. Goyal, K. Winstein, J. Mickens, and H. Balakrishnan. Mahimahi: Accurate record-and-replay for http. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 417--429, 2015.
[31]
A. Nichol, V. Pfau, C. Hesse, O. Klimov, and J. Schulman. Gotta learn fast: A new benchmark for generalization in rl. arXiv preprint arXiv:1804.03720, 2018.
[32]
X. Nie, Y. Zhao, Z. Li, G. Chen, K. Sui, J. Zhang, Z. Ye, and D. Pei. Dynamic tcp initial windows and congestion control schemes through reinforcement learning. IEEE Journal on Selected Areas in Communications, 37(6):1231--1247, 2019.
[33]
I. Osband, J. Aslanides, and A. Cassirer. Randomized prior functions for deep reinforcement learning. In Advances in Neural Information Processing Systems, pages 8617--8629, 2018.
[34]
I. Osband, C. Blundell, A. Pritzel, and B. Van Roy. Deep exploration via bootstrapped dqn. In Advances in neural information processing systems, pages 4026--4034, 2016.
[35]
P.-Y. Oudeyer, F. Kaplan, and V. V. Hafner. Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265--286, 2007.
[36]
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345--1359, 2009.
[37]
D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 16--17, 2017.
[38]
A. Pattanaik, Z. Tang, S. Liu, G. Bommannan, and G. Chowdhary. Robust deep reinforcement learning with adversarial attacks. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2040--2042. International Foundation for Autonomous Agents and Multiagent Systems, 2018.
[39]
C. E. Rasmussen. Gaussian processes in machine learning. In Summer School on Machine Learning, pages 63--71. Springer, 2003.
[40]
H. Riiser, P. Vigmostad, C. Griwodz, and P. Halvorsen. Commute path bandwidth traces from 3g networks: analysis and applications. In Proceedings of the 4th ACM Multimedia Systems Conference, pages 114--118, 2013.
[41]
E. Sarafian, A. Tamar, and S. Kraus. Safe policy learning from observations. arXiv preprint arXiv:1805.07805, 2018.
[42]
T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International Conference on Information Processing in Medical Imaging, pages 146--157. Springer, 2017.
[43]
J. Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990--2010). IEEE Transactions on Autonomous Mental Development, 2(3):230--247, 2010.
[44]
B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Comput., 13(7):1443--1471, July 2001.
[45]
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897, 2015.
[46]
G. A. Seber and A. J. Lee. Linear regression analysis, volume 329. John Wiley & Sons, 2012.
[47]
R. Sekar, O. Rybkin, K. Daniilidis, P. Abbeel, D. Hafner, and D. Pathak. Planning to explore via self-supervised world models. arXiv preprint arXiv:2005.05960, 2020.
[48]
Y. Song and Z. Ou. Learning neural random fields with inclusive auxiliary generators. ArXiv, abs/1806.00271, 2018.
[49]
Y. Sun, X. Yin, J. Jiang, V. Sekar, F. Lin, N. Wang, T. Liu, and B. Sinopoli. Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction. In Proceedings of the 2016 ACM SIGCOMM Conference, pages 272--285, 2016.
[50]
R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
[51]
A. Tamar, Y. Wu, G. Thomas, S. Levine, and P. Abbeel. Value iteration networks. In Advances in Neural Information Processing Systems, pages 2154--2162, 2016.
[52]
D. M. J. Tax and R. P. W. Duin. Support vector data description. Mach. Learn., 54(1):45--66, Jan. 2004.
[53]
M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(Jul):1633--1685, 2009.
[54]
P. Thomas and E. Brunskill. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pages 2139--2148, 2016.
[55]
P. Thomas, G. Theocharous, and M. Ghavamzadeh. High confidence policy improvement. In International Conference on Machine Learning, pages 2380--2388, 2015.
[56]
P. S. Thomas, G. Theocharous, and M. Ghavamzadeh. High-confidence off-policy evaluation. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[57]
A. Valadarsky, M. Schapira, D. Shahaf, and A. Tamar. Learning to route. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks, pages 185--191, 2017.
[58]
J. van der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, T. Bostoen, and F. De Turck. HTTP/2-Based Adaptive Streaming of HEVC Video Over 4G/LTE Networks. IEEE Communications Letters, 20(11):2177--2180, 2016.
[59]
P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. Jarrod Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey,. I. Polat, Y. Feng, E. W. Moore, J. Vand erPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and S... Contributors. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261--272, 2020.
[60]
W. Wang, A. Wang, A. Tamar, X. Chen, and P. Abbeel. Safer classification by synthesis. CoRR, abs/1711.08534, 2017.
[61]
F. Y. Yan, H. Ayers, C. Zhu, S. Fouladi, J. Hong, K. Zhang, P. Levis, and K. Winstein. Learning in situ: a randomized experiment in video streaming. In 17th {USENIX} Symposium on Networked Systems Design and Implementation ($$NSDI$$ 20), pages 495--511, 2020.
[62]
F. Y. Yan, J. Ma, G. Hill, D. Raghavan, R. S. Wahby, P. Levis, and K. Winstein. Pantheon: the training ground for internet congestion-control research. Measurement at http://pantheon. stanford. edu/result/1622, 2018.
[63]
X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. A control-theoretic approach for dynamic adaptive video streaming over http. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, pages 325--338, 2015.
[64]
C. Zhang, O. Vinyals, R. Munos, and S. Bengio. A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893, 2018.

Cited By

View all
  • (2024)When Classic Meets Intelligence: A Hybrid Multipath Congestion Control FrameworkIEEE/ACM Transactions on Networking10.1109/TNET.2024.339535632:4(3575-3590)Online publication date: Aug-2024
  • (2023)Explore to generalize in zero-shot RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668880(63174-63196)Online publication date: 10-Dec-2023
  • (2023)Challenging social media threats using collective well-being-aware recommendation algorithms and an educational virtual companionFrontiers in Artificial Intelligence10.3389/frai.2022.6549305Online publication date: 9-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HotNets '20: Proceedings of the 19th ACM Workshop on Hot Topics in Networks
November 2020
228 pages
ISBN:9781450381451
DOI:10.1145/3422604
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. network protocol design
  2. reinforcement learning
  3. sequential decision making
  4. video streaming

Qualifiers

  • Research-article

Funding Sources

  • Israel Science Foundation
  • Israel Science Foundation (ISF)
  • NSF-BSF grant

Conference

HotNets '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 110 of 460 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)When Classic Meets Intelligence: A Hybrid Multipath Congestion Control FrameworkIEEE/ACM Transactions on Networking10.1109/TNET.2024.339535632:4(3575-3590)Online publication date: Aug-2024
  • (2023)Explore to generalize in zero-shot RLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668880(63174-63196)Online publication date: 10-Dec-2023
  • (2023)Challenging social media threats using collective well-being-aware recommendation algorithms and an educational virtual companionFrontiers in Artificial Intelligence10.3389/frai.2022.6549305Online publication date: 9-Jan-2023
  • (2023)Towards Future-Based Explanations for Deep RL Network ControllersACM SIGMETRICS Performance Evaluation Review10.1145/3626570.362660751:2(100-102)Online publication date: 2-Oct-2023
  • (2023)Tackling Deployability Challenges in ML-Powered NetworksACM SIGMETRICS Performance Evaluation Review10.1145/3626570.362660551:2(94-96)Online publication date: 2-Oct-2023
  • (2023)Illuminating the hidden challenges of data-driven CDNsProceedings of the 3rd Workshop on Machine Learning and Systems10.1145/3578356.3592574(94-103)Online publication date: 8-May-2023
  • (2023)Adaptive QoS-aware multipath congestion control for live streamingComputer Networks10.1016/j.comnet.2022.109470220(109470)Online publication date: Jan-2023
  • (2022)ACCeSS: Adaptive QoS-aware Congestion Control for Multipath TCP2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)10.1109/IWQoS54832.2022.9812886(1-10)Online publication date: 10-Jun-2022
  • (2021)A unified congestion control framework for diverse application preferences and network conditionsProceedings of the 17th International Conference on emerging Networking EXperiments and Technologies10.1145/3485983.3494840(282-296)Online publication date: 2-Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media