skip to main content
10.1145/3452296.3472901acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

A composition framework for change management

Published: 09 August 2021 Publication History

Abstract

Change management has been a long-standing challenge for network operations. The large scale and diversity of networks, their complex dependencies, and continuous evolution through technology and software updates combined with the risk of service impact create tremendous challenges to effectively manage changes. In this paper, we use data from a large service provider and experiences of their operations teams to highlight the need for quick and easy adaptation of change management capabilities and keep up with the continuous network changes. We propose a new framework CORNET (COmposition fRamework for chaNge managEmenT) with key ideas of modularization of changes into building blocks, flexible composition into change workflows, change plan optimization, change impact verification, and automated translation of high-level change management intent into low-level implementations and mathematical models. We demonstrate the effectiveness of CORNET using real-world data collected from 4G and 5G cellular networks and virtualized services such as VPN and SDWAN running in the cloud as well as experiments conducted on a testbed of virtualized network functions. We also share our operational experiences and lessons learned from successfully using CORNET within a large service provider network over the last three years.

Supplementary Material

gupta-public-review (160-public-review.pdf)
A Composition Framework for Change Management: Public Review
MP4 File (video-presentation.mp4)
Conference Presentation Video

References

[1]
2020. AT&T provides some details about March 5 FirstNet data-service outage. https://urgentcomm.com/2020/03/18/att-provides-some-details-about-march-5-firstnet-data-service-outage/ Retrieved January 26, 2021 from
[2]
2020. The great 2020 Gmail outage: A tale of two blackouts, and lessons learned. https://www.itworldcanada.com/article/the-great-2020-gmail-outage-a-tale-of-two-blackouts-and-lessons-learned/439924 Retrieved January 26, 2021 from
[3]
2020. T-Mobile screwups caused nationwide outage, but FCC isn't punishing carrier. https://arstechnica.com/tech-policy/2020/10/fcc-not-punishing-t-mobile-for-outage-that-ajit-pai-called-unacceptable/ Retrieved January 26, 2021 from
[4]
2021a. BPMN IO Viewer and Editor. https://bpmn.io/ Retrieved January 26, 2021 from
[5]
2021. Facebook says 'configuration change' caused some users to be logged out unexpectedly. https://www.theverge.com/2021/1/23/22245842/facebook-logged-out-configuration-change-ios-app-security Retrieved January 26, 2021 from
[6]
2021. Fastly blames software bug for major global internet outage. https://www.reuters.com/business/media-telecom/fastly-blames-software-bug-major-global-internet-outage-2021-06-09/ Retrieved June 14, 2021 from
[7]
2021. Firsnet. https://www.firstnet.com/ Retrieved January 26, 2021 from
[8]
2021b. Object Management Group Business Process Model and Notation. https://www.bpmn.org/ Retrieved January 26, 2021 from
[9]
2021. Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region. https://aws.amazon.com/message/41926/ Retrieved January 26, 2021 from
[10]
Anubhavnidhi Abhashkumar, Aaron Gember-Jacobson, and Aditya Akella. 2020. AED: Incrementally Synthesizing Policy-Compliant and Manageable Configurations. In Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies. Association for Computing Machinery, New York, NY, USA, 482--495.
[11]
Omid Alipourfard, Jiaqi Gao, Jeremie Koenig, Chris Harshaw, Amin Vahdat, and Minlan Yu. 2019. Risk Based Planning of Network Changes in Evolving Data Centers. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP'19). Association for Computing Machinery, New York, NY, USA, 414--429.
[12]
Ansible. 2021. Drive automation across open hybrid cloud deployments. https://www.ansible.com/ Retrieved January 26, 2021 from
[13]
Mona Attariyan, Michael Chow, and Jason Flinn. 2012. X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Association, 307--320.
[14]
Ryan Beckett, Aarti Gupta, Ratul Mahajan, and David Walker. 2017a. A General Approach to Network Configuration Verification. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (Los Angeles, CA, USA) (SIGCOMM '17). Association for Computing Machinery, New York, NY, USA, 155--168.
[15]
Ryan Beckett and Ratul Mahajan. 2020. A General Framework for Compositional Network Modeling. In Proceedings of the 19th ACM Workshop on Hot Topics in Networks (Virtual Event, USA) (HotNets '20). Association for Computing Machinery, New York, NY, USA, 8--15.
[16]
Ryan Beckett, Ratul Mahajan, Todd Millstein, Jitendra Padhye, and David Walker. 2016. Don't Mind the Gap: Bridging Network-Wide Objectives and Device-Level Configurations. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM '16). Association for Computing Machinery, New York, NY, USA, 328--341.
[17]
Ryan Beckett, Ratul Mahajan, Todd Millstein, Jitendra Padhye, and David Walker. 2017b. Network Configuration Synthesis with Abstract Topologies. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (Barcelona, Spain) (PLDI 2017). Association for Computing Machinery, New York, NY, USA, 437--451.
[18]
Rüdiger Birkner, Dana Drachsler-Cohen, Laurent Vanbever, and Martin Vechev. 2020. Config2Spec: Mining Network Specifications from Network Configurations. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA.
[19]
Camunda. 2021. Camunda Microservices Orchestration. https://camunda.com/solutions/microservices-orchestration/ Retrieved January 26, 2021 from
[20]
Chef. 2021. Chef. https://www.chef.io/ Retrieved January 26, 2021 from
[21]
COIN-OR. 2021. Cbc: COIN-OR branch and cut.
[22]
Carlos Eduardo de Andrade, Ajay A Mahimkar, Rakesh K Sinha, Weiyi Zhang, Andre Cire, Giritharan Rana, Zihui Ge, Sarat Puthenpura, Jennifer Yates, and Robert Riding. 2021. Minimizing Effort and Risk with Network Change Deployment Planning. In 20th Annual IFIP Networking Conference 2021 (IFIP Networking 2021). Helsinki, Finland.
[23]
Ahmed El-Hassany, Petar Tsankov, Laurent Vanbever, and Martin Vechev. 2018. Netcomplete: Practical Network-Wide Configuration Synthesis with Autocompletion. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (Renton, WA, USA) (NSDI'18). USENIX Association, USA, 579--594.
[24]
William Enck, Patrick McDaniel, Subhabrata Sen, Panagiotis Sebos, Sylke Spoerel, Albert Greenberg, Sanjay Rao, and William Aiello. 2007. Configuration Management at Massive Scale: System Design and Experience. In 2007 USENIX Annual Technical Conference (USENIX ATC 07). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/2007-usenix-annual-technical-conference/configuration-management-massive-scale-system
[25]
Seyed K. Fayaz, Tushar Sharma, Ari Fogel, Ratul Mahajan, Todd Millstein, Vyas Sekar, and George Varghese. 2016. Efficient Network Reachability Analysis Using a Succinct Control Plane Representation. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA.
[26]
Nick Feltovich. 2003. Nonparametric Tests of Differences in Medians: Comparison of the WilcoxonMann-Whitney and Robust Rank-Order Tests. Experimental Economics (2003).
[27]
Ari Fogel, Stanley Fung, Luis Pedrosa, Meg Walraed-Sullivan, Ramesh Govindan, Ratul Mahajan, and Todd Millstein. 2015. A General Approach to Network Configuration Analysis. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation (Oakland, CA) (NSDI'15). USENIX Association, USA, 469--483.
[28]
Aaron Gember-Jacobson, Raajay Viswanathan, Aditya Akella, and Ratul Mahajan. 2016. Fast Control Plane Analysis Using an Abstract Representation. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM '16). Association for Computing Machinery, New York, NY, USA, 300--313.
[29]
Milad Ghaznavi, Elaheh Jalalpour, Bernard Wong, Raouf Boutaba, and Ali José Mashtizadeh. 2020. Fault Tolerant Service Function Chaining. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (Virtual Event, USA) (SIGCOMM '20). Association for Computing Machinery, New York, NY, USA, 198--210.
[30]
Ramesh Govindan, Ina Minei, Mahesh Kallahalla, Bikash Koley, and Amin Vahdat. 2016. Evolve or Die: High-Availability Design Principles Drawn from Googles Network Infrastructure. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM '16). ACM, New York, NY, USA, 58--72.
[31]
Xin Jin, Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Jennifer Rexford, and Roger Wattenhofer. 2014. Dynamic Scheduling of Network Updates. In Proceedings of the 2014 ACM Conference on SIGCOMM (Chicago, Illinois, USA) (SIGCOMM '14). ACM, New York, NY, USA, 539--550.
[32]
Siva Kesava Reddy Kakarla, Ryan Beckett, Behnaz Arzani, Todd Millstein, and George Varghese. 2020. GRoot: Proactive Verification of DNS Configurations. In SIGCOMM 2020. https://www.microsoft.com/en-us/research/publication/groot-proactive-verification-of-dns-configurations/ Best Paper Award.
[33]
Peyman Kazemian, George Varghese, and Nick McKeown. 2012. Header Space Analysis: Static Checking for Networks. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX Association, USA.
[34]
Teemu Koponen, Martin Casado, Natasha Gude, Jeremy Stribling, Leon Poutievski, Min Zhu, Rajiv Ramanathan, Yuichiro Iwata, Hiroaki Inoue, Takayuki Hama, and Scott Shenker. 2010. Onix: A Distributed Control Platform for Large-Scale Production Networks. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (Vancouver, BC, Canada) (OSDI'10). USENIX Association, USA, 351--364.
[35]
John R. Lanzante. 1996. Resistant, Robust and Non-parametric Techniques for the analysis of climate data: Theory and Examples, including applications to Historical Radiosonde Station. International Journal of Climatology (1996).
[36]
Ze Li, Qian Cheng, Ken Hsieh, Yingnong Dang, Peng Huang, Pankaj Singh, Xinsheng Yang, Qingwei Lin, Youjiang Wu, Sebastien Levy, and Murali Chintalapati. 2020. Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Large-Scale Cloud Infrastructure. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 389--402. https://www.usenix.org/conference/nsdi20/presentation/li
[37]
Hongqiang Harry Liu, Xin Wu, Wei Zhou, Weiguo Chen, Tao Wang, Hui Xu, Lei Zhou, Qing Ma, and Ming Zhang. 2018. Automatic Life Cycle Management of Network Configurations. In Proceedings of the Afternoon Workshop on Self-Driving Networks (Budapest, Hungary) (SelfDN 2018). Association for Computing Machinery, New York, NY, USA, 29--35.
[38]
Yujie Liu, Yong Li, Yue Wang, and Jian Yuan. 2015. Optimal Scheduling for Multi-flow Update in Software-Defined Networks. Journal of Network and Computer Applications 54, C (Aug. 2015), 11--19. 1084-8045
[39]
Ajay Mahimkar, Zihui Ge, Jia Wang, Jennifer Yates, Yin Zhang, Joanne Emmons, Brian Huntley, and Mark Stockert. 2011. Rapid detection of maintenance induced changes in service performance. In ACM CoNEXT.
[40]
Ajay Mahimkar, Zihui Ge, Jennifer Yates, Chris Hristov, Vincent Cordaro, Shane Smith, Jing Xu, and Mark Stockert. 2013. Robust Assessment of Changes in Cellular Networks. In ACM CoNEXT.
[41]
Ajay Mahimkar, Han Hee Song, Zihui Ge, Aman Shaikh, Jia Wang, Jennifer Yates, Yin Zhang, and Joanne Emmons. 2010. Detecting the Performance Impact of Upgrades in Large Operational Networks. In ACM SIGCOMM.
[42]
Haohui Mai, Ahmed Khurshid, Rachit Agarwal, Matthew Caesar, P. Brighten Godfrey, and Samuel Talmadge King. 2011. Debugging the Data Plane with Anteater. SIGCOMM Comput. Commun. Rev. 41, 4 (Aug. 2011), 290--301. 0146-4833
[43]
MiniZinc. 2020. MiniZinc - a free and open constraint modeling language. https://www.minizinc.org Accessed on 2019-09-16.
[44]
Jeffrey C. Mogul, Alvin AuYoung, Sujata Banerjee, Lucian Popa, Jeongkeun Lee, Jayaram Mudigonda, Puneet Sharma, and Yoshio Turner. 2013. Corybantic: Towards the Modular Composition of SDN Control Programs. In Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks (College Park, Maryland) (HotNets-XII). ACM, New York, NY, USA, Article 1, 7 pages.
[45]
NetConf. 2021. Network Configuration Protocol (NETCONF). https://en.wikipedia.org/wiki/NETCONF Retrieved January 26, 2021 from
[46]
Binh Nguyen, Zihui Ge, Jacobus Van der Merwe, He Yan, and Jennifer Yates. 2015. ABSENCE: Usage-Based Failure Detection in Mobile Networks. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (Paris, France) (MobiCom '15). 464--476.
[47]
Laurent Perron and Vincent Furnon. 2019. OR-Tools. https://developers.google.com/optimization Retrieved January 26, 2021 from
[48]
Santhosh Prabhu, Kuan-Yen Chou, Ali Kheradmand, P. Brighten Godfrey, and Matthew Caesar. 2020. Plankton: Scalable network configuration verification through model checking. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA.
[49]
Chaithan Prakash, Jeongkeun Lee, Yoshio Turner, Joon-Myung Kang, Aditya Akella, Sujata Banerjee, Charles Clark, Yadi Ma, Puneet Sharma, and Ying Zhang. 2015. PGA: Using Graphs to Express and Automatically Reconcile Network Policies. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (London, United Kingdom) (SIGCOMM '15). Association for Computing Machinery, New York, NY, USA, 29--42.
[50]
Mubashir Adnan Qureshi, Ajay Mahimkar, Lili Qiu, Zihui Ge, Max Zhang, and Ioannis Broustis. 2017. Coordinating rolling software upgrades for cellular networks. In 25th IEEE International Conference on Network Protocols, ICNP 2017, Toronto, ON, Canada, October 10-13, 2017. 1--10.
[51]
Shambwaditya Saha, Santhosh Prabhu, and P. Madhusudan. 2015. NetGen: Synthesizing Data-Plane Configurations for Network Policies. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (Santa Clara, California) (SOSR '15). Association for Computing Machinery, New York, NY, USA, Article 17, 6 pages.
[52]
Raja R. Sambasivan, Alice X. Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, and Gregory R. Ganger. 2011. Diagnosing performance changes by comparing request flows. In USENIX NSDI.
[53]
S. Siegel and N. J. Jr. Castellan. 1998. Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hilli (1998).
[54]
Samuel Steffen, Timon Gehr, Petar Tsankov, Laurent Vanbever, and Martin Vechev. 2020. Probabilistic Verification of Network Configurations. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (Virtual Event, USA) (SIGCOMM '20). Association for Computing Machinery, New York, NY, USA, 750--764.
[55]
Peng Sun, Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, and Ahsan Arefin. 2014. A Network-state Management Service. In Proceedings of the 2014 ACM Conference on SIGCOMM (Chicago, Illinois, USA) (SIGCOMM '14). ACM, New York, NY, USA, 563--574.
[56]
Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, and Hongyi Zeng. 2016. Robotron: Top-down Network Management at Facebook Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM '16). ACM, New York, NY, USA, 426--439.
[57]
Aisha Syed, Bilal Anwer, Vijay Gopalakrishnan, and Jacobus Van der Merwe. 2019. DEPO: A Platform for Safe DEployment of POlicy in a Software Defined Infrastructure (SOSR '19). 98--111.
[58]
Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. 2015. Holistic Configuration Management at Facebook. In Proceedings of the 25th Symposium on Operating Systems Principles (Monterey, California) (SOSP '15). Association for Computing Machinery, New York, NY, USA, 328--343.
[59]
Bingchuan Tian, Xinyi Zhang, Ennan Zhai, Hongqiang Harry Liu, Qiaobo Ye, Chunsheng Wang, Xin Wu, Zhiming Ji, Yihong Sang, Ming Zhang, Da Yu, Chen Tian, Haitao Zheng, and Ben Y. Zhao. 2019. Safely and Automatically Updating In-Network ACL Configurations with Intent Language. In Proceedings of the ACM Special Interest Group on Data Communication (Beijing, China) (SIGCOMM '19). Association for Computing Machinery, New York, NY, USA, 214--226.
[60]
Zengwen Yuan, Qianru Li, Yuanjie Li, Songwu Lu, Chunyi Peng, and George Varghese. 2018. Resolving Policy Conflicts in Multi-Carrier Cellular Access. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (New Delhi, India) (MobiCom '18). ACM, New York, NY, USA, 147--162.
[61]
Pamela Zave, Ronaldo A. Ferreira, Xuan Kelvin Zou, Masaharu Morimoto, and Jennifer Rexford. 2017. Dynamic Service Chaining with Dysco. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (Los Angeles, CA, USA) (SIGCOMM '17). Association for Computing Machinery, New York, NY, USA, 57--70.

Cited By

View all
  • (2024)Interpretable Failure Localization for Microservice Systems Based on Graph AutoencoderACM Transactions on Software Engineering and Methodology10.1145/369599934:2(1-28)Online publication date: 13-Sep-2024
  • (2024)Eagle: Toward Scalable and Near-Optimal Network-Wide Sketch Deployment in Network MeasurementProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672244(291-310)Online publication date: 4-Aug-2024
  • (2024)Topaz: Declarative and Verifiable Authoritative DNS at CDN-ScaleProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672240(891-903)Online publication date: 4-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '21: Proceedings of the 2021 ACM SIGCOMM 2021 Conference
August 2021
868 pages
ISBN:9781450383837
DOI:10.1145/3452296
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. change plan optimization
  2. composition framework
  3. impact verification
  4. network change management

Qualifiers

  • Research-article

Conference

SIGCOMM '21
Sponsor:
SIGCOMM '21: ACM SIGCOMM 2021 Conference
August 23 - 27, 2021
Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Interpretable Failure Localization for Microservice Systems Based on Graph AutoencoderACM Transactions on Software Engineering and Methodology10.1145/369599934:2(1-28)Online publication date: 13-Sep-2024
  • (2024)Eagle: Toward Scalable and Near-Optimal Network-Wide Sketch Deployment in Network MeasurementProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672244(291-310)Online publication date: 4-Aug-2024
  • (2024)Topaz: Declarative and Verifiable Authoritative DNS at CDN-ScaleProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672240(891-903)Online publication date: 4-Aug-2024
  • (2024)ChangeRCA: Finding Root Causes from Software Changes in Large Online SystemsProceedings of the ACM on Software Engineering10.1145/36437281:FSE(24-46)Online publication date: 12-Jul-2024
  • (2024)No More Data Silos: Unified Microservice Failure Diagnosis with Temporal Knowledge GraphIEEE Transactions on Services Computing10.1109/TSC.2024.3489444(1-14)Online publication date: 2024
  • (2024)Change Management and Advanced Wireless Network Implementation: Detection of Monetization Opportunities and ChallengesIEEE Transactions on Engineering Management10.1109/TEM.2022.318765071(2524-2534)Online publication date: 2024
  • (2024)CloudPlanner: Minimizing Upgrade Risk of Virtual Network Devices for Large-Scale Cloud NetworksIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621109(741-750)Online publication date: 20-May-2024
  • (2022)Towards automatic troubleshooting for user-level performance degradation in cellular servicesProceedings of the 28th Annual International Conference on Mobile Computing And Networking10.1145/3495243.3560535(716-728)Online publication date: 14-Oct-2022
  • (2022)Identifying Erroneous Software Changes through Self-Supervised Contrastive Learning on Time Series Data2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE55969.2022.00043(366-377)Online publication date: Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media