skip to main content
10.1145/3152434.3152452acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Stop Rerouting!: Enabling ShareBackup for Failure Recovery in Data Center Networks

Published:30 November 2017Publication History

ABSTRACT

This paper introduces sharable backup as a novel solution to failure recovery in data center networks. It allows the entire network to share a small pool of backup devices. This proposal is grounded in three key observations. First, the traditional rerouting-based failure recovery is ineffective, because bandwidth loss from failures degrades application performance drastically. Therefore, failed devices should be replaced to restore bandwidth. Second, failures in data centers are rare but destructive [11], so it is desirable to seek cost-effective backup options. Third, the emergence of configurable data center network architectures promises feasibility of bringing backup devices online dynamically. We design the ShareBackup prototype architecture to realize this idea. Compared to rerouting-based solutions, ShareBackup provides more bandwidth with short path length at low cost.

Skip Supplemental Material Section

Supplemental Material

xia.mp4

mp4

894.7 MB

References

  1. Coflow-Benchmark, https://github.com/coflow/coflow-benchmark/.Google ScholarGoogle Scholar
  2. FS.COM, http://www.fs.com/.Google ScholarGoogle Scholar
  3. Introducing data center fabric, the next-generation Facebook data center network, https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/.Google ScholarGoogle Scholar
  4. J. H. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber. HyperX: Topology, Routing, and Packaging of Efficient Large-scale Networks. In SC '09, pages 41:1--41:11, Portland, Oregon, USA, November 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM '08, pages 63--74, Seattle, Washington, USA, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Chen, A. Singla, A. Singh, K. Ramachandran, L. Xu, Y. Zhang, X. Wen, and Y. Chen. OSA: An Optical Switching Architecture for Data Center Networks with Unprecedented Flexibility. In NSDI '12, San Joes, CA, April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Chen, X. Wen, X. Ma, Y. Chen, Y. Xia, C. Hu, and Q. Dong. Wave-Cube: A Scalable, Fault-tolerant, High-performance Optical Data Center Architecture. In 2015 IEEE Conference on Computer Communications (INFOCOM), pages 1903--1911, April 2015.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Chowdhury and I. Stoica. Coflow: A Networking Abstraction for Cluster Applications. In HotNets-XI, pages 31--36, Redmond, WA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat. Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers. In SIGCOMM '10, pages 339--350, New Delhi, India, August 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Ghobadi, R. Mahajan, A. Phanishayee, N. Devanur, J. Kulkarni, G. Ranade, P.-A. Blanche, H. Rastegarfar, M. Glick, and D. Kilper. ProjecToR: Agile Reconfigurable Data Center Interconnect. In Proceedings of the 2016 Conference on ACM SIGCOMM 2016 Conference, SIGCOMM '16, pages 216--229, Florianopolis, Brazil, August 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Gill, N. Jain, and N. Nagappan. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, pages 350--361, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM '09, pages 51--62, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. In SIGCOMM '09, pages 63--74, Barcelona, Spain, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: A Scalable and Fault-Tolerant Network Structure for Data Centers. In SIGCOMM '08, pages 75--86, Seattle, Washington, USA, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Halperin, S. Kandula, J. Padhye, P. Bahl, and D. Wetherall. Augmenting Data Center Networks with Multi-gigabit Wireless Links. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, pages 38--49, Toronto, Ontario, Canada, August 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Hamedazimi, Z. Qazi, H. Gupta, V. Sekar, S. R. Das, J. P. Longtin, H. Shah, and A. Tanwer. FireFly: A Reconfigurable Wireless Data Center Fabric Using Free-space Optics. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, pages 319--330, Chicago, Illinois, USA, August 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. He, J. Khalid, A. Gember-Jacobson, S. Das, C. Prakash, A. Akella, L. E. Li, and M. Thottan. Measuring Control Plane Latency in SDN-enabled Switches. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research, SOSR '15, pages 25:1--25:6, Santa Clara, California, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Legtchenko, N. Chen, D. Cletheroe, A. Rowstron, H. Williams, and X. Zhao. XFabric: A Reconfigurable In-Rack Network for Rack-Scale Computers. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 15--29, Santa Clara, CA, 2016. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Liu, D. Halperin, A. Krishnamurthy, and T. Anderson. F10: A Fault-Tolerant Engineered Network. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 399--412, Lombard, IL, 2013. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. J. Liu, P. X. Gao, B. Wong, and S. Keshav. Quartz: A New Design Element for Low-latency DCNs. In SIGCOMM '14, pages 283--294, Chicago, Illinois, USA, August 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A Scalable Fault-tolerant Layer 2 Data Center Network Fabric. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM '09, pages 39--50, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Porter, R. Strong, N. Farrington, A. Forencich, P. Chen-Sun, T. Rosing, Y. Fainman, G. Papen, and A. Vahdat. Integrating Microsecond Circuit Switching into the Data Center. In SIGCOMM '13, pages 447--458, Hong Kong, China, August 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Schlansker, M. Tan, J. Tourrilhes, J. R. Santos, and S.-Y. Wang. Configurable optical interconnects for scalable datacenters. In Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), 2013, pages 1--3. IEEE, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In SIGCOMM '15, pages 183--197, London, United Kingdom, August 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Singla, C.-Y. Hong, L. Popa, and P. B. Godfrey. Jellyfish: Networking Data Centers Randomly. In NSDI '12, pages 1--14, San Jose, California, USA, April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Walraed-Sullivan, A. Vahdat, and K. Marzullo. Aspen Trees: Balancing Data Center Fault Tolerance, Scalability and Cost. In Proceedings of the Ninth ACM Conference on Emerging Networking Experiments and Technologies, CoNEXT '13, pages 85--96, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. S. E. Ng, M. Kozuch, and M. Ryan. c-Through: Part-time Optics in Data Centers. In SIGCOMM '10, pages 327--338, New Delhi, India, August 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. C. Wu, O. Solgaard, and J. E. Ford. Optical MEMS for Lightwave Communication. Journal of Lightwave Technology, 24(12):4433--4454, December 2006.Google ScholarGoogle ScholarCross RefCross Ref
  29. Y. Xia and T. S. E. Ng. Flat-tree: A Convertible Data Center Network Architecture from Clos to Random Graph. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, HotNets '16, pages 71--77, Atlanta, GA, November 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Xia, X. S. Sun, S. Dzinamarira, D. Wu, X. S. Huang, and T. S. E. Ng. A tale of two topologies: Exploring convertible data center network architectures with flat-tree. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM '17, pages 295--308, New York, NY, USA, 2017. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Zhou, Z. Zhang, Y. Zhu, Y. Li, S. Kumar, A. Vahdat, B. Y. Zhao, and H. Zheng. Mirror Mirror on the Ceiling: Flexible Wireless Links for Data Centers. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM '12, pages 443--454, Helsinki, Finland, August 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Stop Rerouting!: Enabling ShareBackup for Failure Recovery in Data Center Networks
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                HotNets '17: Proceedings of the 16th ACM Workshop on Hot Topics in Networks
                November 2017
                206 pages
                ISBN:9781450355698
                DOI:10.1145/3152434

                Copyright © 2017 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 30 November 2017

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed limited

                Acceptance Rates

                HotNets '17 Paper Acceptance Rate28of124submissions,23%Overall Acceptance Rate110of460submissions,24%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader