ABSTRACT
Datacenter networking has brought high-performance storage systems' research to the foreground once again. Many modern storage systems are built with commodity hardware and TCP/IP networking to save costs. In this paper, we highlight a group of problems that are present in such storage systems and which are all related to the use of TCP. As an alternative, we explore Trevi: a fountain coding-based approach for distributing I/O requests that overcomes these problems while still efficiently scheduling resources across both networking and storage layers. We also discuss how receiver-driven flow and congestion control, in combination with fountain coding, can guide the design of Trevi and provide a viable alternative to TCP for datacenter storage.
- M. Aguilera, R. Janakiraman, and L. Xu. Using erasure codes efficiently for storage in a distributed system. In Proc. of DSN 2005, 2005. Google ScholarDigital Library
- M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008. Google ScholarDigital Library
- R. J. Anderson. The Eternity service. In Pragocrypt, 1996.Google Scholar
- L. S. Brakmo, S. W. O'Malley, and L. L. Peterson. TCP Vegas: new techniques for congestion detection and avoidance. In SIGCOMM, 1994. Google ScholarDigital Library
- P. Breuer, A. Lopez, and A. Ares. The Network Block Device. Linux Journal, March 2000. Google ScholarDigital Library
- P. H. Carns, W. B. Ligon, III, R. B. Ross, and R. Thakur. PVFS: a parallel file system for Linux clusters. In USENIX ALS, 2000. Google ScholarDigital Library
- P. Cataldi, M. Shatarski, M. Grangetto, and E. Magli. Implementation and performance evaluation of LT and raptor codes for multimedia applications. In IIH-MSP, 2006. Google ScholarDigital Library
- A. G. Dimakis, V. Prabhakaran, and K. Ramchandran. Decentralized erasure codes for distributed networked storage. IEEE Transactions on Information Theory, 52: 2809--2816, 2006.Google ScholarCross Ref
- L. Ellenberg. DRBD 9 and device-mapper: Linux block level storage replication. In the Linux System Technology Conference, 2009.Google Scholar
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In SOSP, 2003. Google ScholarDigital Library
- A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. Maltz, P. Patel, and S. Sengupta. VL2: a scalable and flexible data center network. ACM SIGCOMM CCR, 39(4), 2009. Google ScholarDigital Library
- S. Hand and T. Roscoe. Mnemosyne: Peer-to-Peer steganographic storage. In IPTPS, 2002. Google ScholarDigital Library
- C. Hopps. Analysis of an equal-cost multi-path algorithm. RFC 2992, 2000. Google ScholarDigital Library
- M. Luby. LT Codes. In Proc. of FOCS, 2002. Google ScholarDigital Library
- A. Madhavapeddy, R. Mortier, C. Rotsos, D. Scott, B. Singh, T. Gazagnaire, S. Smith, S. Hand, and J. Crowcroft. Unikernels: library operating systems for the cloud. In ASPLOS, 2013. Google ScholarDigital Library
- S. McCanne, V. Jacobson, and M. Vetterli. Receiver-driven layered multicast. In SIGCOMM, 1996. Google ScholarDigital Library
- M. Menth, F. Lehrieder, B. Briscoe, P. Eardley, T. Moncaster, et al. A survey of PCN-based admission control and flow termination. Communications Surveys & Tutorials, IEEE, 12(3): 357--375, 2010. Google ScholarDigital Library
- E. B. Nightingale, J. Elson, J. Fan, O. Hofmann, J. Howell, and Y. Suzue. Flat datacenter storage. In USENIX OSDI, 2012. Google ScholarDigital Library
- Oracle. The Oracle Clustered File System. http://oss.oracle.com/projects/ocfs/.Google Scholar
- G. Parisis, G. Xylomenos, and T. Apostolopoulos. DHTbd: A reliable block-based storage system for high performance clusters. In CCGRID, 2011. Google ScholarDigital Library
- B. Pawlowski, D. Noveck, D. Robinson, and R. Thurlow. The NFS version 4 protocol. In SANE 2000, 2000.Google Scholar
- A. Phanishayee, E. Krevat, V. Vasudevan, D. G. Andersen, G. R. Ganger, G. A. Gibson, and S. Seshan. Measurement and analysis of TCP throughput collapse in cluster-based storage systems. In USENIX FAST, 2008. Google ScholarDigital Library
- C. Raiciu, C. Paasch, S. Barre, A. Ford, M. Honda, F. Duchene, O. Bonaventure, and M. Handley. How hard can it be? designing and implementing a deployable multipath TCP. In Proc. of USENIX NSDI, 2012. Google ScholarDigital Library
- Y. Saito, S. Frolund, A. C. Veitch, A. Merchant, and S. Spence. FAB: building distributed enterprise disk arrays from commodity components. In ASPLOS, 2004. Google ScholarDigital Library
- F. Schmuck and R. Haskin. GPFS: A shared-disk file system for large computing clusters. In of USENIX FAST, 2002. Google ScholarDigital Library
- P. Schwan. Lustre: Building a file system for 1,000-node clusters. In Linux Symposium, 2003.Google Scholar
- A. Shokrollahi. Raptor codes. IEEE Transactions on Information Theory, 52(6): 2551--2567, 2006.Google ScholarDigital Library
- K. Tan and J. Song. A Compound TCP approach for high-speed and long distance networks. In IEEE INFOCOM, 2006.Google ScholarCross Ref
- V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger, G. A. Gibson, and B. Mueller. Safe and effective fine-grained TCP retransmissions for datacenter communication. In SIGCOMM, 2009. Google ScholarDigital Library
- S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: a scalable, high-performance distributed file system. In USENIX SOSP, 2006.Google Scholar
- B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou. Scalable performance of the Panasas parallel file system. In USENIX FAST, 2008. Google ScholarDigital Library
- H. Wu, Z. Feng, C. Guo, and Y. Zhang. ICTCP: Incast congestion control for TCP in data center networks. In Proceedings of CoNEXT, 2010. Google ScholarDigital Library
- Y. Zhang and N. Ansari. On mitigating TCP incast in data center networks. In Proc. of IEEE INFOCOM, 2011.Google ScholarCross Ref
Index Terms
- Trevi: watering down storage hotspots with cool fountain codes
Recommendations
Polyraptor: Embracing Path and Data Redundancy in Data Centres for Efficient Data Transport
SIGCOMM '18: Proceedings of the ACM SIGCOMM 2018 Conference on Posters and DemosIn this paper, we introduce Polyraptor, a novel data transport protocol that uses RaptorQ (RQ) codes and is tailored for one-to-many and many-to-one data transfer patterns, which are extremely common in modern data centres. Polyraptor builds on previous ...
A simple and efficient approach for reducing TCP timeouts due to lack of duplicate acknowledgments in data center networks
The problem of TCP incast in data centers attracts a lot of attention in our research community. TCP incast is a catastrophic throughput collapse that occurs when multiple senders transmitting TCP data simultaneously to a single aggregator. Based on ...
TCP incast solutions in data center networks: A classification and survey
AbstractIn recent years, Data Centers Networks (DCNs) have been deployed to serve as the backbone to support the extensive variety of services offered through the Internet like social networking, web hosting, and e-commerce. The Transmission ...
Comments