skip to main content
10.1145/3589334.3645321acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

ARES: Predictable Traffic Engineering under Controller Failures in SD-WANs

Published: 13 May 2024 Publication History

Abstract

Emerging web applications (e.g., video streaming and Web of Things applications) account for a large share of traffic in Wide Area Networks (WANs) and provide traffic with various Quality of Service (QoS) requirements. Software-Defined Wide Area Networks (SD-WANs) offer a promising opportunity to enhance the performance of Traffic Engineering (TE), which aims to enable differentiable QoS for numerous web applications. Nevertheless, SD-WANs are managed by controllers, and unpredictable controller failures may undermine flexible network management. Switches previously controlled by the failed controllers may become offline, and flows traversing these offline switches lose the path programmability to route flows on available forwarding paths. Thus, these offline flows cannot be routed/rerouted on previous paths to accommodate potential traffic variations, leading to severe TE performance degradation. Existing recovery solutions reassign offline switches to other active controllers to recover the degraded path programmability but fail to promise good TE performance since higher path programmability does not necessarily guarantee satisfactory TE performance. In this paper, we propose ARES to provide predictable TE performance under controller failures. We formulate an optimization problem to maintain predictable TE performance by jointly considering fine-grained flow-controller reassignment using P4 Runtime and flow rerouting and propose ARES to efficiently solve this problem. Extensive simulation results demonstrate that our problem formulation exhibits comparable load balancing performance to optimal TE solution without controller failures, and the proposed ARES significantly improves average load balancing performance by up to 43.36% with low computation time compared with existing solutions.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review, Vol. 44, 3 (2014), 87--95.
[2]
Jian Chu and Chin-Tau Lea. 2008. Optimal link weights for IP-based networks supporting hose-model VPNs. IEEE/ACM Transactions on Networking, Vol. 17, 3 (2008), 778--788.
[3]
Tamal Das, Vignesh Sridharan, and Mohan Gurusamy. 2019. A survey on controller placement in SDN. IEEE Communications Surveys & Tutorials, Vol. 22, 1 (2019), 472--503.
[4]
Songshi Dou, Zehua Guo, and Yuanqing Xia. 2021a. ProgrammabilityMedic: Predictable path programmability recovery under multiple controller failures in SD-WANs. In Proc. of the IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE, 461--471.
[5]
Songshi Dou, Guochun Miao, Zehua Guo, Chao Yao, Weiran Wu, and Yuanqing Xia. 2021b. Matchmaker: Maintaining network programmability for software-defined WANs under multiple controller failures. Computer Networks, Vol. 192 (2021), 108045.
[6]
Kuntai Du, Ahsan Pervaiz, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang, Henry Hoffmann, and Junchen Jiang. 2020. Server-driven video streaming for deep learning inference. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 557--570.
[7]
Zehua Guo, Songshi Dou, Wenchao Jiang, and Yuanqing Xia. 2023. Toward Improved Path Programmability Recovery for Software-Defined WANs Under Multiple Controller Failures. IEEE/ACM Transactions on Networking (2023).
[8]
Zehua Guo, Wendi Feng, Sen Liu, Wenchao Jiang, Yang Xu, and Zhi-Li Zhang. 2019. RetroFlow: Maintaining control resiliency and flow programmability for software-defined WANs. In Proc. of the IEEE/ACM International Symposium on Quality of Service (IWQoS). 1--10.
[9]
Gurobi Optimization. 2024. Gurobi Optimizer. https://www.gurobi.com/solutions/gurobi-optimizer/. Accessed on February 6, 2024.
[10]
Nikhil Handigol, Srinivasan Seetharaman, Mario Flajslik, Nick McKeown, and Ramesh Johari. 2009. Plug-n-Serve: Load-balancing web traffic using OpenFlow. Proc. of the ACM Special Interest Group on Data Communication (SIGCOMM) Demo, Vol. 4, 5 (2009), 6.
[11]
Fujun He and Eiji Oki. 2023. Preventive Priority Setting against Multiple Controller Failures in Software Defined Networks. IEEE Transactions on Parallel and Distributed Systems (2023).
[12]
Qiang He, Zeqian Dong, Feifei Chen, Shuiguang Deng, Weifa Liang, and Yun Yang. 2022. Pyramid: Enabling hierarchical neural networks with edge computing. In Proc. of the ACM Web Conference (WWW). 1860--1870.
[13]
Brandon Heller, Rob Sherwood, and Nick McKeown. 2012. The controller placement problem. ACM SIGCOMM Computer Communication Review, Vol. 42, 4 (2012), 473--478.
[14]
Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving high utilization with software-driven WAN. In Proc. of the ACM Special Interest Group on Data Communication (SIGCOMM). 15--26.
[15]
Chi-Yao Hong, Subhasree Mandal, Mohammad Al-Fares, Min Zhu, Richard Alimi, Chandan Bhagat, Sourabh Jain, Jay Kaimal, Shiyu Liang, Kirill Mendelev, et al. 2018. B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN. In Proc. of the ACM Special Interest Group on Data Communication (SIGCOMM). 74--87.
[16]
Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R Ganger, Phillip B Gibbons, and Onur Mutlu. 2017. Gaia: Geo-Distributed machine learning approaching LAN speeds. In Proc. of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 629--647.
[17]
Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, et al. 2013. B4: Experience with a globally-deployed software defined WAN. ACM SIGCOMM Computer Communication Review, Vol. 43, 4 (2013), 3--14.
[18]
Kübra Kalkan, Levent Altay, Gürkan Gür, and Fatih Alagöz. 2018. JESS: Joint entropy-based DDoS defense scheme in SDN. IEEE Journal on Selected Areas in Communications, Vol. 36, 10 (2018), 2358--2372.
[19]
Umesh Krishnaswamy, Rachee Singh, Nikolaj Bjørner, and Himanshu Raj. 2022. Decentralized cloud wide-area network traffic engineering with BLASTSHIELD. In Proc. of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 325--338.
[20]
Praveen Kumar, Yang Yuan, Chris Yu, Nate Foster, Robert Kleinberg, Petr Lapukhov, Chiun Lin Lim, and Robert Soulé. 2018. Semi-Oblivious Traffic Engineering: The Road Not Taken. In Proc. of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 157--170.
[21]
Alaitz Mendiola, Jasone Astorga, Eduardo Jacob, and Marivi Higuero. 2016. A survey on the contributions of software-defined networking to traffic engineering. IEEE Communications Surveys & Tutorials, Vol. 19, 2 (2016), 918--953.
[22]
Debasis Mitra and KG Ramakrishnan. 1999. A case study of multiservice, multipriority traffic engineering design for data networks. In Proc. of the IEEE Global Communications Conference (GLOBECOM), Vol. 1. IEEE, 1077--1083.
[23]
John Moy. 1997. OSPF version 2. Technical Report.
[24]
Open Networking Foundation. 2024. ONOS Controller. https://onosproject.org. Accessed on February 6, 2024.
[25]
Nattakorn Promwongsa, Amin Ebrahimzadeh, Diala Naboulsi, Somayeh Kianpisheh, Fatna Belqasmi, Roch Glitho, Noel Crespi, and Omar Alfandi. 2020. A comprehensive survey of the tactile internet: State-of-the-art and research directions. IEEE Communications Surveys & Tutorials, Vol. 23, 1 (2020), 472--523.
[26]
Harald R"acke. 2008. Optimal hierarchical decompositions for congestion minimization in networks. In Proc. of the Annual ACM symposium on Theory of computing (STOC). 255--264.
[27]
Quan Ren, Zehua Guo, Jiangxing Wu, Tao Hu, Lu Jie, Yuxiang Hu, and Lei He. 2022. SDN-ESRC: A Secure and Resilient Control Plane for Software-Defined Networks. IEEE Transactions on Network and Service Management, Vol. 19, 3 (2022), 2366--2381.
[28]
Lucas V Ruchel, Rogério C Turchetti, and Edson T de Camargo. 2022. Evaluation of the robustness of SDN controllers ONOS and ODL. Computer Networks, Vol. 219 (2022), 109403.
[29]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, et al. 2015. Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. ACM SIGCOMM Computer Communication Review, Vol. 45, 4 (2015), 183--197.
[30]
Dave Thaler and C Hopps. 2000. Multipath issues in unicast and multicast next-hop selection. Technical Report.
[31]
The Linux Foundation. 2024. OpenDayLight Controller. https://www.opendaylight.org. Accessed on February 6, 2024.
[32]
The P4.org API Working Group. 2024. P4Runtime Specification version 1.3.0. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html. Accessed on February 6, 2024.
[33]
Amin Tootoonchian, Monia Ghobadi, and Yashar Ganjali. 2010. OpenTM: traffic matrix estimator for OpenFlow networks. In Proc. of the International Conference on Passive and Active Network Measurement (PAM). Springer, 201--210.
[34]
Steve Uhlig, Bruno Quoitin, Jean Lepropre, and Simon Balon. 2006. Providing public intradomain traffic matrices to the research community. ACM SIGCOMM Computer Communication Review, Vol. 36, 1 (2006), 83--86.
[35]
Niels LM Van Adrichem, Christian Doerr, and Fernando A Kuipers. 2014. Opennetmon: Network monitoring in openflow software-defined networks. In 2014 IEEE Network Operations and Management Symposium (NOMS). IEEE, 1--8.
[36]
Ning Wang, Kin Hon Ho, George Pavlou, and Michael Howarth. 2008. An overview of routing optimization for internet traffic engineering. IEEE Communications Surveys & Tutorials, Vol. 10, 1 (2008), 36--56.
[37]
Xiong Wang, Qi Deng, Jing Ren, Mehdi Malboubi, Sheng Wang, Shizhong Xu, and Chen-Nee Chuah. 2019. The joint optimization of online traffic matrix measurement and traffic engineering for software-defined networks. IEEE/ACM Transactions on Networking, Vol. 28, 1 (2019), 234--247.
[38]
Zhaohua Wang, Zhenyu Li, Guangming Liu, Yunfei Chen, Qinghua Wu, and Gang Cheng. 2021. Examination of WAN traffic characteristics in a large-scale data center network. In Proc. of the ACM Internet Measurement Conference (IMC). 1--14.
[39]
Wenfeng Xia, Yonggang Wen, Chuan Heng Foh, Dusit Niyato, and Haiyong Xie. 2014. A survey on software-defined networking. IEEE Communications Surveys & Tutorials, Vol. 17, 1 (2014), 27--51.
[40]
An Xie, Xiaoliang Wang, Wei Wang, and Sanglu Lu. 2014. Designing a disaster-resilient network with software defined networking. In Proc. of the IEEE International Symposium of Quality of Service (IWQoS). IEEE, 135--140.
[41]
Junjie Xie, Deke Guo, Xiaozhou Li, Yulong Shen, and Xiaohong Jiang. 2018. Cutting long-tail latency of routing response in software defined networks. IEEE Journal on Selected Areas in Communications, Vol. 36, 3 (2018), 384--396.
[42]
Zhenjie Yang, Yong Cui, Baochun Li, Yadong Liu, and Yi Xu. 2019. Software-defined wide area network (SD-WAN): Architecture, advances and opportunities. In Proc. of the IEEE International Conference on Computer Communication and Networks (ICCCN). IEEE, 1--9.
[43]
Guang Yao, Jun Bi, and Luyi Guo. 2013. On the cascading failures of multi-controllers in software defined networks. In Proc. of the IEEE International Conference on Network Protocols (ICNP). IEEE, 1--2.
[44]
Minghao Ye, Yang Hu, Junjie Zhang, Zehua Guo, and H Jonathan Chao. 2023. Reinforcement Learning-based Traffic Engineering for QoS Provisioning and Load Balancing. In Proc. of the IEEE/ACM International Symposium on Quality of Service (IWQoS). IEEE, 1--10.
[45]
Junjie Zhang, Kang Xi, Min Luo, and H Jonathan Chao. 2014. Load balancing for multiple traffic matrices using SDN hybrid routing. In Proc. of the IEEE International Conference on High Performance Switching and Routing (HPSR). IEEE, 44--49.

Cited By

View all
  • (2025)Maintaining Predictable Traffic Engineering Performance Under Controller Failures for Software-Defined WANsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2025.352881443:2(524-536)Online publication date: Feb-2025

Index Terms

  1. ARES: Predictable Traffic Engineering under Controller Failures in SD-WANs

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '24: Proceedings of the ACM Web Conference 2024
        May 2024
        4826 pages
        ISBN:9798400701719
        DOI:10.1145/3589334
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 13 May 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. controller failures.
        2. software-defined wide area networks
        3. traffic engineering
        4. web services

        Qualifiers

        • Research-article

        Funding Sources

        • Zhejiang Lab Open Research Project

        Conference

        WWW '24
        Sponsor:
        WWW '24: The ACM Web Conference 2024
        May 13 - 17, 2024
        Singapore, Singapore

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)182
        • Downloads (Last 6 weeks)35
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Maintaining Predictable Traffic Engineering Performance Under Controller Failures for Software-Defined WANsIEEE Journal on Selected Areas in Communications10.1109/JSAC.2025.352881443:2(524-536)Online publication date: Feb-2025

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media