skip to main content
10.1145/3603269.3604845acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale

Published:01 September 2023Publication History

ABSTRACT

Quality and cost are two key considerations for video conferencing services. Service providers face a dilemma when selecting network tiers to build their infrastructure---relying on Internet links has poor quality, while using premium links brings excessive cost.

We present XRON, a hybrid elastic cloud overlay network for our planetary-scale video conferencing service. XRON differs from prior overlays with two distinct features. First, XRON is hybrid, i.e., it leverages both Internet and premium links to simultaneously achieve high quality and low cost. Second, XRON is elastic, i.e., it exploits elastic cloud resources to adaptively scale its capacity based on realtime demand. The data plane of XRON combines active probing and passive tracking for scalable link state monitoring, uses asymmetric forwarding based on heterogeneous bidirectional link qualities, and quickly reacts to sudden link degradations without the control plane involvement. The control plane of XRON predicts video traffic based on application knowledge, and computes global forwarding paths and reaction plans with scalable algorithms. Large-scale deployment in DingTalk shows that XRON reduces video stall ratio and bad audio fluency by 77% and 65.2%, respectively, compared to using Internet links only, and reduces cost by 79%, compared to using premium links only.

References

  1. 2019. G.107.1 : Wideband E-model. https://www.itu.int/rec/T-REC-G.107.1.Google ScholarGoogle Scholar
  2. 2023. Kubernetes. https://kubernetes.io/.Google ScholarGoogle Scholar
  3. Firas Abuzaid, Srikanth Kandula, Behnaz Arzani, Ishai Menache, Matei Zaharia, and Peter Bailis. 2021. Contracting Wide-area Network Topologies to Solve Flow Problems Quickly.. In USENIX NSDI.Google ScholarGoogle Scholar
  4. Satyajeet Singh Ahuja, Varun Gupta, Vinayak Dangui, Soshant Bali, Abishek Gopalan, Hao Zhong, Petr Lapukhov, Yiting Xia, and Ying Zhang. 2021. Capacity-Efficient and Uncertainty-Resilient Backbone Network Planning with Hose. In ACM SIGCOMM.Google ScholarGoogle Scholar
  5. Zahaib Akhtar, Yun Seong Nam, Ramesh Govindan, Sanjay Rao, Jessica Chen, Ethan Katz-Bassett, Bruno Ribeiro, Jibin Zhan, and Hui Zhang. 2018. Oboe: Auto-Tuning Video ABR Algorithms to Network Conditions. In ACM SIGCOMM.Google ScholarGoogle Scholar
  6. David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris. 2002. Resilient Overlay Networks. ACM SOSP (2002).Google ScholarGoogle Scholar
  7. Yaniv Ben-Itzhak, Aran Bergman, Israel Cidon, Igor Golikov, Alex Markuze, Noga Rotman, and Eyal Zohar. 2022. Cloudcast: Characterizing public clouds connectivity. arXiv preprint arXiv:2201.06989 (2022).Google ScholarGoogle Scholar
  8. Jeremy Bogle, Nikhil Bhatia, Manya Ghobadi, Ishai Menache, Nikolaj Bjørner, Asaf Valadarsky, and Michael Schapira. 2019. TEAVAR: Striking the Right Utilization-Availability Balance in WAN Traffic Engineering. In ACM SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chris X. Cai, Franck Le, Xin Sun, Geoffrey G. Xie, Hani Jamjoom, and Roy H. Campbell. 2016. CRONets: Cloud-Routed Overlay Networks. In IEEE ICDCS.Google ScholarGoogle Scholar
  10. Matt Calder, Ryan Gao, Manuel Schröder, Ryan Stewart, Jitendra Padhye, Ratul Mahajan, Ganesh Ananthanarayanan, and Ethan Katz-Bassett. 2018. Odin: Microsoft's Scalable Fault-Tolerant CDN Measurement System. In USENIX NSDI.Google ScholarGoogle Scholar
  11. Yiyang Chang, Chuan Jiang, Ashish Chandra, Sanjay Rao, and Mohit Tawarmalani. 2019. Lancet: Better Network Resilience by Designing for Pruned Failure Sets. Proc. ACM Meas. Anal. Comput. Syst. (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Marco Chiesa, Gábor Rétvári, and Michael Schapira. 2018. Oblivious Routing in IP Networks. IEEE/ACM Transactions on Networking (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Martin P. Clark. 2000. Wireless Access Networks - Fixed Wireless Access and WLL Networks. John Wiley & Sons, Inc.Google ScholarGoogle Scholar
  14. Mallesham Dasari, Kumara Kahatapitiya, Samir R. Das, Aruna Balasubramanian, and Dimitris Samaras. 2022. Swift: Adaptive Video Streaming with Layered Neural Codecs. In USENIX NSDI.Google ScholarGoogle Scholar
  15. Amogh Dhamdhere, David D. Clark, Alexander Gamero-Garrido, Matthew Luckie, Ricky K. P. Mok, Gautam Akiwate, Kabir Gogia, Vaibhav Bajpai, Alex C. Snoeren, and Kc Claffy. 2018. Inferring Persistent Interdomain Congestion. In ACM SIGCOMM.Google ScholarGoogle Scholar
  16. Asma Enayet and John Heidemann. 2022. Internet Outage Detection Using Passive Analysis. In ACM SIGCOMM Conference on Internet Measurement Conference.Google ScholarGoogle Scholar
  17. S. Even, A. Itai, and A. Shamir. 1976. On the Complexity of Timetable and Multicommodity Flow Problems. SIAM J. Comput. (1976).Google ScholarGoogle Scholar
  18. Andrew D. Ferguson, Steven D. Gribble, Chi-Yao Hong, Charles Edwin Killian, Waqar Mohsin, Henrik Mühe, Joon Suan Ong, Leonid B. Poutievski, Arjun Singh, Lorenzo Vicisano, Richard Alimi, Shawn Shuoshuo Chen, Michael Conley, Subhasree Mandal, Karthik Nagaraj, Kondapa Naidu Bollineni, Amr Sabaa, Shidong Zhang, Min Zhu, and Amin Vahdat. 2021. Orion: Google's Software-Defined Networking Control Plane. In USENIX NSDI.Google ScholarGoogle Scholar
  19. Rodrigo Fonseca, Tianrong Zhang, Karl Deng, and Lihua Yuan. 2019. dShark: A general, easy to program and scalable framework for analyzing in-network packet traces. USENIX NSDI (2019).Google ScholarGoogle Scholar
  20. Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, and Keith Winstein. 2018. Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol. In USENIX NSDI.Google ScholarGoogle Scholar
  21. Michael J. Freedman. 2010. Experiences with CoralCDN: A Five-Year Operational View. In USENIX NSDI.Google ScholarGoogle Scholar
  22. Ehab Ghabashneh and Sanjay Rao. 2020. Exploring the interplay between CDN caching and video streaming performance. In IEEE INFOCOM.Google ScholarGoogle Scholar
  23. Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen, et al. 2015. Pingmesh: A large-scale system for data center network latency measurement and analysis. In ACM SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dongsu Han, David Andersen, Michael Kaminsky, Dina Papagiannaki, and Srinivasan Seshan. 2011. Hulu in the neighborhood. In International Conference on Communication Systems and Networks.Google ScholarGoogle ScholarCross RefCross Ref
  25. Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving High Utilization with Software-Driven WAN. In ACM SIGCOMM.Google ScholarGoogle Scholar
  26. Chi-Yao Hong, Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Kondapa Naidu B., Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, Steve Padgett, Faro Rabe, Saikat Ray, Malveeka Tewari, Matt Tierney, Monika Zahn, Jonathan Zolla, Joon Ong, and Amin Vahdat. 2018. B4 and after: Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN. In ACM SIGCOMM.Google ScholarGoogle Scholar
  27. Te-Yuan Huang, Ramesh Johari, Nick McKeown, Matthew Trunnell, and Mark Watson. 2014. A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service. SIGCOMM CCR (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Per Hurtig, Karl-Johan Grinnemo, Anna Brunstrom, Simone Ferlin, Özgü Alay, and Nicolas Kuhn. 2019. Low-Latency Scheduling in MPTCP. IEEE/ACM Transactions on Networking (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Paras Jain, Sam Kumar, Sarah Wooders, Shishir G Patil, Joseph E Gonzalez, and Ion Stoica. 2023. Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays. USENIX NSDI (2023).Google ScholarGoogle Scholar
  30. Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-Deployed Software Defined Wan. In ACM SIGCOMM.Google ScholarGoogle Scholar
  31. Chuan Jiang, Zixuan Li, Sanjay Rao, and Mohit Tawarmalani. 2022. Flexile: Meeting Bandwidth Objectives Almost Always. In ACM CoNEXT.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chuan Jiang, Sanjay Rao, and Mohit Tawarmalani. 2020. PCF: Provably Resilient Flexible Routing. In ACM SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Junchen Jiang, Rajdeep Das, Ganesh Ananthanarayanan, Philip A. Chou, Venkata Padmanabhan, Vyas Sekar, Esbjorn Dominique, Marcin Goliszewski, Dalibor Kukoleca, Renat Vafin, and Hui Zhang. 2016. Via: Improving Internet Telephony Call Quality Using Predictive Relay Selection (ACM SIGCOMM).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yuchen Jin, Sundararajan Renganathan, Ganesh Ananthanarayanan, Junchen Jiang, Venkata N. Padmanabhan, Manuel Schroder, Matt Calder, and Arvind Krishnamurthy. 2019. Zooming in on Wide-Area Latencies to a Global Cloud Provider. In ACM SIGCOMM.Google ScholarGoogle Scholar
  35. Ethan Katz-Bassett, Harsha V. Madhyastha, Vijay Kumar Adhikari, Colin Scott, Justine Sherry, Peter Van Wesep, Thomas Anderson, and Arvind Krishnamurthy. 2010. Reverse Traceroute. In USENIX NSDI.Google ScholarGoogle Scholar
  36. Jaehong Kim, Youngmok Jung, Hyunho Yeo, Juncheol Ye, and Dongsu Han. 2020. Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning. In ACM SIGCOMM.Google ScholarGoogle Scholar
  37. Jinyang Li, Zhenyu Li, Ri Lu, Kai Xiao, Songlin Li, Jufeng Chen, Jingyu Yang, Chunli Zong, Aiyun Chen, Qinghua Wu, Chen Sun, Gareth Tyson, and Hongqiang Harry Liu. 2022. LiveNet: A Low-Latency Video Transport Network for Large-Scale Live Streaming. In ACM SIGCOMM.Google ScholarGoogle Scholar
  38. Xianshang Lin, Yunfei Ma, Junshao Zhang, Yao Cui, Jing Li, Shi Bai, Ziyue Zhang, Dennis Cai, Hongqiang Harry Liu, and Ming Zhang. 2022. GSO-Simulcast: Global Stream Orchestration in Simulcast Video Conferencing Systems. In ACM SIGCOMM.Google ScholarGoogle Scholar
  39. Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural Adaptive Video Streaming with Pensieve. In ACM SIGCOMM.Google ScholarGoogle Scholar
  40. Zili Meng, Yaning Guo, Chen Sun, Bo Wang, Justine Sherry, Hongqiang Harry Liu, and Mingwei Xu. 2022. Achieving Consistent Low Latency for Wireless Real-Time Communications with the Shortest Control Loop. In ACM SIGCOMM.Google ScholarGoogle Scholar
  41. Vikram Nathan, Vibhaalakshmi Sivaraman, Ravichandra Addanki, Mehrdad Khani, Prateesh Goyal, and Mohammad Alizadeh. 2019. End-to-End Transport for Video QoE Fairness. In ACM SIGCOMM.Google ScholarGoogle Scholar
  42. Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun. 2010. The Akamai Network: A Platform for High-Performance Internet Applications. SIGOPS Operating Systems Review (2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Chunyi Peng, Minkyong Kim, Zhe Zhang, and Hui Lei. 2012. VDN: Virtual machine image distribution network for cloud data centers. In IEEE INFOCOM.Google ScholarGoogle Scholar
  44. Alexander Rabitsch, Per Hurtig, and Anna Brunstrom. 2018. A Stream-Aware Multipath QUIC Scheduler for Heterogeneous Paths. In Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Devdeep Ray, Jack Kosaian, K. V. Rashmi, and Srinivasan Seshan. 2019. Vantage: Optimizing Video Upload for Time-Shifted Viewing of Social Live Streams. In ACM SIGCOMM.Google ScholarGoogle Scholar
  46. Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, João Carreira, Neeraja Jayant Yadwadkar, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica, and David A. Patterson. 2021. What serverless computing is and should become: the next phase of cloud computing. Commun. ACM (2021).Google ScholarGoogle Scholar
  47. Rachee Singh, Sharad Agarwal, Matt Calder, and Paramvir Bahl. 2021. Cost-effective Cloud Edge Traffic Engineering with Cascara.. In USENIX NSDI.Google ScholarGoogle Scholar
  48. Rachee Singh, Nikolaj Bjorner, Sharon Shoham, Yawei Yin, John Arnold, and Jamie Gaudette. 2021. Cost-Effective Capacity Provisioning in Wide Area Networks with Shoofly. In ACM SIGCOMM.Google ScholarGoogle Scholar
  49. Alisha Ukani, Ariana Mirian, and Alex C. Snoeren. 2021. Locked-in during Lock-down: Undergraduate Life on the Internet in a Pandemic. In ACM SIGCOMM Conference on Internet Measurement Conference.Google ScholarGoogle Scholar
  50. Kevin Vermeulen, Ege Gurmericliler, Italo Cunha, David Choffnes, and Ethan Katz-Bassett. 2022. Internet Scale Reverse Traceroute. In ACM SIGCOMM Conference on Internet Measurement Conference.Google ScholarGoogle Scholar
  51. Limin Wang, Kyoung Soo Park, Ruoming Pang, Vivek Pai, and Larry Peterson. 2004. Reliability and Security in the CoDeeN Content Distribution Network. In USENIX ATC.Google ScholarGoogle Scholar
  52. Hongjia Wu, Özgü Alay, Anna Brunstrom, Simone Ferlin, and Giuseppe Caso. 2020. Peekaboo: Learning-Based Multipath Scheduling for Dynamic Heterogeneous Environments. IEEE Journal on Selected Areas in Communications (2020).Google ScholarGoogle Scholar
  53. Francis Y. Yan, Hudson Ayers, Chenzhi Zhu, Sadjad Fouladi, James Hong, Keyi Zhang, Philip Levis, and Keith Winstein. 2020. Learning in Situ: A Randomized Experiment in Video Streaming. In USENIX NSDI.Google ScholarGoogle Scholar
  54. Kok-Kiong Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines, Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka, Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. 2017. Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering. In ACM SIGCOMM.Google ScholarGoogle Scholar
  55. Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP. SIGCOMM CCR (2015).Google ScholarGoogle Scholar
  56. Minlan Yu, Wenjie Jiang, Haoyuan Li, and Ion Stoica. 2012. Tradeoffs in CDN Designs for Throughput Oriented Traffic. In ACM CoNEXT.Google ScholarGoogle Scholar
  57. Diman Zad Tootaghaj, Faraz Ahmed, Puneet Sharma, and Mihalis Yannakakis. 2020. Homa: An Efficient Topology and Route Management Approach in SD-WAN Overlays. In IEEE INFOCOM.Google ScholarGoogle Scholar
  58. Ming Zhang, Chi Zhang, Vivek Pai, Larry Peterson, and Randy Wang. 2004. PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services. In USENIX OSDI.Google ScholarGoogle Scholar
  59. Xu Zhang, Yiyang Ou, Siddhartha Sen, and Junchen Jiang. 2021. SENSEI: Aligning Video Streaming Quality with Dynamic User Sensitivity.. In USENIX NSDI.Google ScholarGoogle Scholar
  60. Zhilong Zheng, Yunfei Ma, Yanmei Liu, Furong Yang, Zhenyu Li, Yuanbo Zhang, Jiuhai Zhang, Wei Shi, Wentao Chen, Ding Li, Qing An, Hai Hong, Hongqiang Harry Liu, and Ming Zhang. 2021. XLINK: QoE-Driven Multi-Path QUIC Transport in Large-Scale Video Services. In ACM SIGCOMM.Google ScholarGoogle Scholar
  61. Zhizhen Zhong, Manya Ghobadi, Alaa Khaddaj, Jonathan Leach, Yiting Xia, and Ying Zhang. 2021. ARROW: Restoration-Aware Traffic Engineering. In ACM SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Fan Zhou, David Choffnes, and Kaushik Chowdhury. 2019. Janus: A Multi-TCP Framework for Application-Aware Optimization in Mobile Networks. IEEE Transactions on Mobile Computing (2019).Google ScholarGoogle Scholar
  63. Shunmin Zhu, Jianyuan Lu, Biao Lyu, Tian Pan, Chenhao Jia, Xin Cheng, Daxiang Kang, Yilong Lv, Fukun Yang, Xiaobo Xue, et al. 2022. Zoonet: a proactive telemetry system for large-scale cloud networks. In ACM CoNEXT.Google ScholarGoogle Scholar
  64. Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y Zhao, et al. 2015. Packet-level telemetry in large datacenter networks. In ACM SIGCOMM.Google ScholarGoogle Scholar

Index Terms

  1. XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference
          September 2023
          1217 pages
          ISBN:9798400702365
          DOI:10.1145/3603269

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate554of3,547submissions,16%
        • Article Metrics

          • Downloads (Last 12 months)621
          • Downloads (Last 6 weeks)82

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader