ABSTRACT
This paper presents Hoyan-- the first reported large scale deployment of configuration verification in a global-scale wide area network (WAN). Hoyan has been running in production for more than two years and is currently used for all critical configuration auditing and updates on the WAN. We highlight our innovative designs and real-life experience to make Hoyan accurate and scalable in practice. For accuracy under the inconsistencies of devices' vendor-specific behaviors (VSBs), Hoyan continuously discovers the flaws in device behavior models, thus aiding the operators in fixing the models. For scalability to verify our global WAN, Hoyan introduces a "global-simulation & local formal-modeling" strategy to model uncertainties in small scales and perform aggressive pruning of possibilities during the protocol simulations. Hoyan achieves near-100% verification accuracy after it detected and fixed O(10) VSBs on our WAN. Hoyan has prevented many potential service failures resulting from misconfiguration and reduced the failure rate of updates of our WAN by more than half in 2019.
Supplemental Material
- Anubhavnidhi Abhashkumar, Aaron Gember-Jacobson, and Aditya Akella. 2020. Tiramisu: Fast and General Network Verification. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- Carlos Ansötegui, Maria Luisa Bonet, and Jordi Levy. 2009. Solving (weighted) partial MaxSAT through Satisfiability Testing. In 12th International Conference on Theory and Applications of Satisfiability Testing (SAT).Google ScholarDigital Library
- Carlos Ansötegui, Maria Luisa Bonet, and Jordi Levy. 2010. A new algorithm for weighted partial MaxSAT. In 24th Conference on Artificial Intelligence (AAAI).Google ScholarCross Ref
- Ryan Beckett, Aarti Gupta, Ratul Mahajan, and David Walker. 2017. A general approach to network configuration verification. In ACM SIGCOMM (SIGCOMM).Google Scholar
- Ryan Beckett, Aarti Gupta, Ratul Mahajan, and David Walker. 2018. Control plane compression. In ACM SIGCOMM (SIGCOMM).Google Scholar
- Ryan Beckett and Ratul Mahajan. 2019. Putting network verification to good use. In 18th ACM Workshop on Hot Topics in Networks (HotNets).Google ScholarDigital Library
- Ryan Beckett, Ratul Mahajan, Todd Millstein, Jitendra Padhye, and David Walker. 2016. Don't mind the gap: Bridging network-wide objectives and device-level configurations. In ACM SIGCOMM (SIGCOMM).Google Scholar
- Rüdiger Birkner, Dana Drachsler-Cohen, Laurent Vanbever, and Martin T. Vechev. 2018. Net2Text: Query-guided summarization of network forwarding behaviors. In 15th USENIX Conference on Networked Systems Design and Implementation (NSDI).Google Scholar
- Rüdiger Birkner, Dana Drachsler-Cohen, Laurent Vanbever, and Martin T. Vechev. 2020. Config2Spec: Mining network specifications from network configurations. In 17th USENIX Conference on Networked Systems Design and Implementation (NSDI).Google Scholar
- Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS).Google ScholarCross Ref
- Seyed K. Fayaz, Tushar Sharma, Ari Fogel, Ratul Mahajan, Todd Millstein, Vyas Sekar, and George Varghese. 2016. Efficient network reachability analysis using a succinct control plane representation. In 12th USENIX Conference on Operating Systems Design and Implementation (OSDI).Google ScholarDigital Library
- Ari Fogel, Stanley Fung, Luis Pedrosa, Meg Walraed-Sullivan, Ramesh Govindan, Ratul Mahajan, and Todd Millstein. 2015. A general approach to network configuration analysis. In 12th USENIX Conference on Networked Systems Design and Implementation (NSDI).Google ScholarDigital Library
- Aaron Gember-Jacobson, Raajay Viswanathan, Aditya Akella, and Ratul Mahajan. 2016. Fast control plane analysis using an abstract representation. In ACM SIGCOMM (SIGCOMM).Google Scholar
- Alex Horn, Ali Kheradmand, and Mukul R. Prasad. 2017. Delta-net: Real-time network verification using atoms. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- Karthick Jayaraman, Nikolaj Bjørner, Jitu Padhye, Amar Agrawal, Ashish Bhargava, Paul-Andre C. Bissonnette, Shane Foster, Andrew Helwer, Mark Kasten, Ivan Lee, Anup Namdhari, Haseeb Niaz, Aniruddha Parkhi, Hanukumar Pinnamraju, Adrian Power, Neha Milind Raje, and Parag Sharma. 2019. Validating datacenters at scale. In ACM SIGCOMM (SIGCOMM).Google Scholar
- Jesper Stenbjerg Jensen, Troels Beck Krøgh, Jonas Sand Madsen, Stefan Schmid, Jiří Srba, and Marc Tom Thorgersen. 2018. P-Rex: Fast verification of MPLS networks with multiple link failures. In 14th International Conference on Emerging Networking EXperiments and Technologies (CoNEXT).Google ScholarDigital Library
- Siva Kesava Reddy K., Alan Tang, Ryan Beckett, Karthick Jayaraman, Todd D. Millstein, Yuval Tamir, and George Varghese. 2020. Finding network misconfigurations by automatic template inference. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- Peyman Kazemian, George Varghese, and Nick McKeown. 2012. Header space analysis: Static checking for networks. In 9th USENIX Conference on Networked Systems Design and Implementation (NSDI).Google Scholar
- Ahmed Khurshid, Xuan Zhou, Whenxuan Zhou, Matthew Caesar, and Philip Brighten Godfrey. 2013. VeriFlow: Verifying network-wide invariants in real time. In 10th USENIX Conference on Networked Systems Design and Implementation (NSDI).Google Scholar
- Hongqiang Harry Liu, Yibo Zhu, Jitu Padhye, Jiaxin Cao, Sri Tallapragada, Nuno P. Lopes, Andrey Rybalchenko, Guohan Lu, and Lihua Yuan. 2017. CrystalNet: Faithfully emulating large production networks. In 26th Symposium on Operating Systems Principles (SOSP).Google ScholarDigital Library
- Nuno P. Lopes, Nikolaj Bjørner, Patrice Godefroid, Karthick Jayaraman, and George Varghese. 2015. Checking beliefs in dynamic networks. In 12th USENIX Symposium on Networked System Design and Implementation (NSDI).Google Scholar
- Nuno P. Lopes and Andrey Rybalchenko. 2019. Fast BGP Simulation of Large Datacenters. In 20th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI).Google Scholar
- Gordon D. Plotkin, Nikolaj Bjørner, Nuno P. Lopes, Andrey Rybalchenko, and George Varghese. 2016. Scaling network verification using symmetry and surgery. In 43rd ACM Symposium on Principles of Programming Languages (POPL).Google ScholarDigital Library
- Santhosh Prabhu, Kuan Yen Chou, Ali Kheradmand, Brighten Godfrey, and Matthew Caesar. 2020. Plankton: Scalable network configuration verification through model checking. In 17th USENIX Symposium on Networked System Design and Implementation (NSDI).Google Scholar
- Bruno Quoitin and Steve Uhlig. 2005. Modeling the routing of an autonomous system with C-BGP. IEEE Network 19, 6 (2005), 12--19.Google ScholarDigital Library
- J. Scudder, R. Fernando, and S. Stuart. 2016. BGP Monitoring Protocol (BMP). RFC 7854. IETF. http://tools.ietf.org/rfc/rfc7854.txtGoogle Scholar
- Radu Stoenescu, Matei Popovici, Lorina Negreanu, and Costin Raiciu. 2016. SymNet: Scalable symbolic execution for modern networks. In ACM SIGCOMM (SIGCOMM).Google Scholar
- Bingchuan Tian, Xinyi Zhang, Ennan Zhai, Hongqiang Harry Liu, Qiaobo Ye, Chunsheng Wang, Xin Wu, Zhiming Ji, Yihong Sang, Ming Zhang, Da Yu, Chen Tian, Haitao Zheng, and Ben Y. Zhao. 2019. Safely and automatically updating in-network ACL configurations with intent language. In ACM SIGCOMM (SIGCOMM).Google Scholar
- Konstantin Weitz, Doug Woos, Emina Torlak, Michael D. Ernst, Arvind Krishnamurthy, and Zachary Tatlock. 2016. Scalable verification of border gateway protocol configurations with an SMT solver. In ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).Google ScholarDigital Library
- Da Yu, Yibo Zhu, Behnaz Arzani, Rodrigo Fonseca, Tianrong Zhang, Karl Deng, and Lihua Yuan. 2019. dShark: A general, easy to program and scalable framework for analyzing in-network packet traces. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- Hongyi Zeng, Peyman Kazemian, George Varghese, and Nick McKeown. 2012. Automatic test packet generation. In 8th International Conference on Emerging Networking Experiments and Technologies (CoNEXT).Google ScholarDigital Library
- Ennan Zhai, Ang Chen, Ruzica Piskac, Mahesh Balakrishnan, Bingchuan Tian, Bo Song, and Haoliang Zhang. 2020. Check before you change: Preventing correlated failures in service updates. In 17th USENIX Symposium on Networked System Design and Implementation (NSDI).Google Scholar
Index Terms
- Accuracy, Scalability, Coverage: A Practical Configuration Verifier on a Global WAN
Recommendations
IP multicast fault recovery in PIM over OSPF
ICNP '00: Proceedings of the 2000 International Conference on Network ProtocolsLittle attention has been given to understanding the fault recovery characteristics and performance tuning of native IP multicast networks. This paper focuses on the interactions of the component protocols to understand their behavior in network failure ...
Incremental Network Configuration Verification
HotNets '20: Proceedings of the 19th ACM Workshop on Hot Topics in NetworksNetwork configurations are constantly changing, and each change poses a risk of catastrophic network outages. Consequently, the networking community has put significant effort into developing and optimizing configuration verifiers. However, we observe ...
Scalability improvement of the real time control protocol
Scalability problems arise when the Real Time Control Protocol (RTCP), which is the control protocol of the Real-time Transport Protocol (RTP), is used in large multicast groups. The problems include: increased feedback delay, increased storage state at ...
Comments