ABSTRACT
Quality and cost are two key considerations for video conferencing services. Service providers face a dilemma when selecting network tiers to build their infrastructure---relying on Internet links has poor quality, while using premium links brings excessive cost.
We present XRON, a hybrid elastic cloud overlay network for our planetary-scale video conferencing service. XRON differs from prior overlays with two distinct features. First, XRON is hybrid, i.e., it leverages both Internet and premium links to simultaneously achieve high quality and low cost. Second, XRON is elastic, i.e., it exploits elastic cloud resources to adaptively scale its capacity based on realtime demand. The data plane of XRON combines active probing and passive tracking for scalable link state monitoring, uses asymmetric forwarding based on heterogeneous bidirectional link qualities, and quickly reacts to sudden link degradations without the control plane involvement. The control plane of XRON predicts video traffic based on application knowledge, and computes global forwarding paths and reaction plans with scalable algorithms. Large-scale deployment in DingTalk shows that XRON reduces video stall ratio and bad audio fluency by 77% and 65.2%, respectively, compared to using Internet links only, and reduces cost by 79%, compared to using premium links only.
- 2019. G.107.1 : Wideband E-model. https://www.itu.int/rec/T-REC-G.107.1.Google Scholar
- 2023. Kubernetes. https://kubernetes.io/.Google Scholar
- Firas Abuzaid, Srikanth Kandula, Behnaz Arzani, Ishai Menache, Matei Zaharia, and Peter Bailis. 2021. Contracting Wide-area Network Topologies to Solve Flow Problems Quickly.. In USENIX NSDI.Google Scholar
- Satyajeet Singh Ahuja, Varun Gupta, Vinayak Dangui, Soshant Bali, Abishek Gopalan, Hao Zhong, Petr Lapukhov, Yiting Xia, and Ying Zhang. 2021. Capacity-Efficient and Uncertainty-Resilient Backbone Network Planning with Hose. In ACM SIGCOMM.Google Scholar
- Zahaib Akhtar, Yun Seong Nam, Ramesh Govindan, Sanjay Rao, Jessica Chen, Ethan Katz-Bassett, Bruno Ribeiro, Jibin Zhan, and Hui Zhang. 2018. Oboe: Auto-Tuning Video ABR Algorithms to Network Conditions. In ACM SIGCOMM.Google Scholar
- David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris. 2002. Resilient Overlay Networks. ACM SOSP (2002).Google Scholar
- Yaniv Ben-Itzhak, Aran Bergman, Israel Cidon, Igor Golikov, Alex Markuze, Noga Rotman, and Eyal Zohar. 2022. Cloudcast: Characterizing public clouds connectivity. arXiv preprint arXiv:2201.06989 (2022).Google Scholar
- Jeremy Bogle, Nikhil Bhatia, Manya Ghobadi, Ishai Menache, Nikolaj Bjørner, Asaf Valadarsky, and Michael Schapira. 2019. TEAVAR: Striking the Right Utilization-Availability Balance in WAN Traffic Engineering. In ACM SIGCOMM.Google ScholarDigital Library
- Chris X. Cai, Franck Le, Xin Sun, Geoffrey G. Xie, Hani Jamjoom, and Roy H. Campbell. 2016. CRONets: Cloud-Routed Overlay Networks. In IEEE ICDCS.Google Scholar
- Matt Calder, Ryan Gao, Manuel Schröder, Ryan Stewart, Jitendra Padhye, Ratul Mahajan, Ganesh Ananthanarayanan, and Ethan Katz-Bassett. 2018. Odin: Microsoft's Scalable Fault-Tolerant CDN Measurement System. In USENIX NSDI.Google Scholar
- Yiyang Chang, Chuan Jiang, Ashish Chandra, Sanjay Rao, and Mohit Tawarmalani. 2019. Lancet: Better Network Resilience by Designing for Pruned Failure Sets. Proc. ACM Meas. Anal. Comput. Syst. (2019).Google ScholarDigital Library
- Marco Chiesa, Gábor Rétvári, and Michael Schapira. 2018. Oblivious Routing in IP Networks. IEEE/ACM Transactions on Networking (2018).Google ScholarDigital Library
- Martin P. Clark. 2000. Wireless Access Networks - Fixed Wireless Access and WLL Networks. John Wiley & Sons, Inc.Google Scholar
- Mallesham Dasari, Kumara Kahatapitiya, Samir R. Das, Aruna Balasubramanian, and Dimitris Samaras. 2022. Swift: Adaptive Video Streaming with Layered Neural Codecs. In USENIX NSDI.Google Scholar
- Amogh Dhamdhere, David D. Clark, Alexander Gamero-Garrido, Matthew Luckie, Ricky K. P. Mok, Gautam Akiwate, Kabir Gogia, Vaibhav Bajpai, Alex C. Snoeren, and Kc Claffy. 2018. Inferring Persistent Interdomain Congestion. In ACM SIGCOMM.Google Scholar
- Asma Enayet and John Heidemann. 2022. Internet Outage Detection Using Passive Analysis. In ACM SIGCOMM Conference on Internet Measurement Conference.Google Scholar
- S. Even, A. Itai, and A. Shamir. 1976. On the Complexity of Timetable and Multicommodity Flow Problems. SIAM J. Comput. (1976).Google Scholar
- Andrew D. Ferguson, Steven D. Gribble, Chi-Yao Hong, Charles Edwin Killian, Waqar Mohsin, Henrik Mühe, Joon Suan Ong, Leonid B. Poutievski, Arjun Singh, Lorenzo Vicisano, Richard Alimi, Shawn Shuoshuo Chen, Michael Conley, Subhasree Mandal, Karthik Nagaraj, Kondapa Naidu Bollineni, Amr Sabaa, Shidong Zhang, Min Zhu, and Amin Vahdat. 2021. Orion: Google's Software-Defined Networking Control Plane. In USENIX NSDI.Google Scholar
- Rodrigo Fonseca, Tianrong Zhang, Karl Deng, and Lihua Yuan. 2019. dShark: A general, easy to program and scalable framework for analyzing in-network packet traces. USENIX NSDI (2019).Google Scholar
- Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, and Keith Winstein. 2018. Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol. In USENIX NSDI.Google Scholar
- Michael J. Freedman. 2010. Experiences with CoralCDN: A Five-Year Operational View. In USENIX NSDI.Google Scholar
- Ehab Ghabashneh and Sanjay Rao. 2020. Exploring the interplay between CDN caching and video streaming performance. In IEEE INFOCOM.Google Scholar
- Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, Dave Maltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen, et al. 2015. Pingmesh: A large-scale system for data center network latency measurement and analysis. In ACM SIGCOMM.Google ScholarDigital Library
- Dongsu Han, David Andersen, Michael Kaminsky, Dina Papagiannaki, and Srinivasan Seshan. 2011. Hulu in the neighborhood. In International Conference on Communication Systems and Networks.Google ScholarCross Ref
- Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving High Utilization with Software-Driven WAN. In ACM SIGCOMM.Google Scholar
- Chi-Yao Hong, Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Kondapa Naidu B., Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, Steve Padgett, Faro Rabe, Saikat Ray, Malveeka Tewari, Matt Tierney, Monika Zahn, Jonathan Zolla, Joon Ong, and Amin Vahdat. 2018. B4 and after: Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google's Software-Defined WAN. In ACM SIGCOMM.Google Scholar
- Te-Yuan Huang, Ramesh Johari, Nick McKeown, Matthew Trunnell, and Mark Watson. 2014. A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service. SIGCOMM CCR (2014).Google ScholarDigital Library
- Per Hurtig, Karl-Johan Grinnemo, Anna Brunstrom, Simone Ferlin, Özgü Alay, and Nicolas Kuhn. 2019. Low-Latency Scheduling in MPTCP. IEEE/ACM Transactions on Networking (2019).Google ScholarDigital Library
- Paras Jain, Sam Kumar, Sarah Wooders, Shishir G Patil, Joseph E Gonzalez, and Ion Stoica. 2023. Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays. USENIX NSDI (2023).Google Scholar
- Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jon Zolla, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2013. B4: Experience with a Globally-Deployed Software Defined Wan. In ACM SIGCOMM.Google Scholar
- Chuan Jiang, Zixuan Li, Sanjay Rao, and Mohit Tawarmalani. 2022. Flexile: Meeting Bandwidth Objectives Almost Always. In ACM CoNEXT.Google ScholarDigital Library
- Chuan Jiang, Sanjay Rao, and Mohit Tawarmalani. 2020. PCF: Provably Resilient Flexible Routing. In ACM SIGCOMM.Google ScholarDigital Library
- Junchen Jiang, Rajdeep Das, Ganesh Ananthanarayanan, Philip A. Chou, Venkata Padmanabhan, Vyas Sekar, Esbjorn Dominique, Marcin Goliszewski, Dalibor Kukoleca, Renat Vafin, and Hui Zhang. 2016. Via: Improving Internet Telephony Call Quality Using Predictive Relay Selection (ACM SIGCOMM).Google ScholarDigital Library
- Yuchen Jin, Sundararajan Renganathan, Ganesh Ananthanarayanan, Junchen Jiang, Venkata N. Padmanabhan, Manuel Schroder, Matt Calder, and Arvind Krishnamurthy. 2019. Zooming in on Wide-Area Latencies to a Global Cloud Provider. In ACM SIGCOMM.Google Scholar
- Ethan Katz-Bassett, Harsha V. Madhyastha, Vijay Kumar Adhikari, Colin Scott, Justine Sherry, Peter Van Wesep, Thomas Anderson, and Arvind Krishnamurthy. 2010. Reverse Traceroute. In USENIX NSDI.Google Scholar
- Jaehong Kim, Youngmok Jung, Hyunho Yeo, Juncheol Ye, and Dongsu Han. 2020. Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning. In ACM SIGCOMM.Google Scholar
- Jinyang Li, Zhenyu Li, Ri Lu, Kai Xiao, Songlin Li, Jufeng Chen, Jingyu Yang, Chunli Zong, Aiyun Chen, Qinghua Wu, Chen Sun, Gareth Tyson, and Hongqiang Harry Liu. 2022. LiveNet: A Low-Latency Video Transport Network for Large-Scale Live Streaming. In ACM SIGCOMM.Google Scholar
- Xianshang Lin, Yunfei Ma, Junshao Zhang, Yao Cui, Jing Li, Shi Bai, Ziyue Zhang, Dennis Cai, Hongqiang Harry Liu, and Ming Zhang. 2022. GSO-Simulcast: Global Stream Orchestration in Simulcast Video Conferencing Systems. In ACM SIGCOMM.Google Scholar
- Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural Adaptive Video Streaming with Pensieve. In ACM SIGCOMM.Google Scholar
- Zili Meng, Yaning Guo, Chen Sun, Bo Wang, Justine Sherry, Hongqiang Harry Liu, and Mingwei Xu. 2022. Achieving Consistent Low Latency for Wireless Real-Time Communications with the Shortest Control Loop. In ACM SIGCOMM.Google Scholar
- Vikram Nathan, Vibhaalakshmi Sivaraman, Ravichandra Addanki, Mehrdad Khani, Prateesh Goyal, and Mohammad Alizadeh. 2019. End-to-End Transport for Video QoE Fairness. In ACM SIGCOMM.Google Scholar
- Erik Nygren, Ramesh K. Sitaraman, and Jennifer Sun. 2010. The Akamai Network: A Platform for High-Performance Internet Applications. SIGOPS Operating Systems Review (2010).Google ScholarDigital Library
- Chunyi Peng, Minkyong Kim, Zhe Zhang, and Hui Lei. 2012. VDN: Virtual machine image distribution network for cloud data centers. In IEEE INFOCOM.Google Scholar
- Alexander Rabitsch, Per Hurtig, and Anna Brunstrom. 2018. A Stream-Aware Multipath QUIC Scheduler for Heterogeneous Paths. In Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC.Google ScholarDigital Library
- Devdeep Ray, Jack Kosaian, K. V. Rashmi, and Srinivasan Seshan. 2019. Vantage: Optimizing Video Upload for Time-Shifted Viewing of Social Live Streams. In ACM SIGCOMM.Google Scholar
- Johann Schleier-Smith, Vikram Sreekanti, Anurag Khandelwal, João Carreira, Neeraja Jayant Yadwadkar, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica, and David A. Patterson. 2021. What serverless computing is and should become: the next phase of cloud computing. Commun. ACM (2021).Google Scholar
- Rachee Singh, Sharad Agarwal, Matt Calder, and Paramvir Bahl. 2021. Cost-effective Cloud Edge Traffic Engineering with Cascara.. In USENIX NSDI.Google Scholar
- Rachee Singh, Nikolaj Bjorner, Sharon Shoham, Yawei Yin, John Arnold, and Jamie Gaudette. 2021. Cost-Effective Capacity Provisioning in Wide Area Networks with Shoofly. In ACM SIGCOMM.Google Scholar
- Alisha Ukani, Ariana Mirian, and Alex C. Snoeren. 2021. Locked-in during Lock-down: Undergraduate Life on the Internet in a Pandemic. In ACM SIGCOMM Conference on Internet Measurement Conference.Google Scholar
- Kevin Vermeulen, Ege Gurmericliler, Italo Cunha, David Choffnes, and Ethan Katz-Bassett. 2022. Internet Scale Reverse Traceroute. In ACM SIGCOMM Conference on Internet Measurement Conference.Google Scholar
- Limin Wang, Kyoung Soo Park, Ruoming Pang, Vivek Pai, and Larry Peterson. 2004. Reliability and Security in the CoDeeN Content Distribution Network. In USENIX ATC.Google Scholar
- Hongjia Wu, Özgü Alay, Anna Brunstrom, Simone Ferlin, and Giuseppe Caso. 2020. Peekaboo: Learning-Based Multipath Scheduling for Dynamic Heterogeneous Environments. IEEE Journal on Selected Areas in Communications (2020).Google Scholar
- Francis Y. Yan, Hudson Ayers, Chenzhi Zhu, Sadjad Fouladi, James Hong, Keyi Zhang, Philip Levis, and Keith Winstein. 2020. Learning in Situ: A Randomized Experiment in Video Streaming. In USENIX NSDI.Google Scholar
- Kok-Kiong Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines, Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka, Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. 2017. Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering. In ACM SIGCOMM.Google Scholar
- Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP. SIGCOMM CCR (2015).Google Scholar
- Minlan Yu, Wenjie Jiang, Haoyuan Li, and Ion Stoica. 2012. Tradeoffs in CDN Designs for Throughput Oriented Traffic. In ACM CoNEXT.Google Scholar
- Diman Zad Tootaghaj, Faraz Ahmed, Puneet Sharma, and Mihalis Yannakakis. 2020. Homa: An Efficient Topology and Route Management Approach in SD-WAN Overlays. In IEEE INFOCOM.Google Scholar
- Ming Zhang, Chi Zhang, Vivek Pai, Larry Peterson, and Randy Wang. 2004. PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services. In USENIX OSDI.Google Scholar
- Xu Zhang, Yiyang Ou, Siddhartha Sen, and Junchen Jiang. 2021. SENSEI: Aligning Video Streaming Quality with Dynamic User Sensitivity.. In USENIX NSDI.Google Scholar
- Zhilong Zheng, Yunfei Ma, Yanmei Liu, Furong Yang, Zhenyu Li, Yuanbo Zhang, Jiuhai Zhang, Wei Shi, Wentao Chen, Ding Li, Qing An, Hai Hong, Hongqiang Harry Liu, and Ming Zhang. 2021. XLINK: QoE-Driven Multi-Path QUIC Transport in Large-Scale Video Services. In ACM SIGCOMM.Google Scholar
- Zhizhen Zhong, Manya Ghobadi, Alaa Khaddaj, Jonathan Leach, Yiting Xia, and Ying Zhang. 2021. ARROW: Restoration-Aware Traffic Engineering. In ACM SIGCOMM.Google ScholarDigital Library
- Fan Zhou, David Choffnes, and Kaushik Chowdhury. 2019. Janus: A Multi-TCP Framework for Application-Aware Optimization in Mobile Networks. IEEE Transactions on Mobile Computing (2019).Google Scholar
- Shunmin Zhu, Jianyuan Lu, Biao Lyu, Tian Pan, Chenhao Jia, Xin Cheng, Daxiang Kang, Yilong Lv, Fukun Yang, Xiaobo Xue, et al. 2022. Zoonet: a proactive telemetry system for large-scale cloud networks. In ACM CoNEXT.Google Scholar
- Yibo Zhu, Nanxi Kang, Jiaxin Cao, Albert Greenberg, Guohan Lu, Ratul Mahajan, Dave Maltz, Lihua Yuan, Ming Zhang, Ben Y Zhao, et al. 2015. Packet-level telemetry in large datacenter networks. In ACM SIGCOMM.Google Scholar
Index Terms
- XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale
Recommendations
Is your cloud elastic enough?: performance modelling the elasticity of infrastructure as a service (IaaS) cloud applications
ICPE '12: Proceedings of the 3rd ACM/SPEC International Conference on Performance EngineeringElasticity, the ability to rapidly scale resources up and down on demand, is an essential feature of public cloud platforms. However, it is difficult to understand the elasticity requirements of a given application and workload, and if the elasticity ...
Self-managed cost-efficient virtual elastic clusters on hybrid Cloud infrastructures
In this study, we describe the further development of Elastic Cloud Computing Cluster (EC3), a tool for creating self-managed cost-efficient virtual hybrid elastic clusters on top of Infrastructure as a Service (IaaS) clouds. By using spot instances and ...
Physics and microeconomics-based metrics for evaluating cloud computing elasticity
Currently, many customers and broadband providers are using cloud resources, such as processing and storage, for their applications and services. With the increase of computational resources usage, elasticity has become quite attractive and a key ...
Comments