skip to main content
research-article

Rethinking FTP: Aggressive block reordering for large file transfers

Published:09 February 2009Publication History
Skip Abstract Section

Abstract

Whole-file transfer is a basic primitive for Internet content dissemination. Content servers are increasingly limited by disk arm movement, given the rapid growth in disk density, disk transfer rates, server network bandwidth, and content size. Individual file transfers are sequential, but the block access sequence on a content server is effectively random when many slow clients access large files concurrently. Although larger blocks can help improve disk throughput, buffering requirements increase linearly with block size.

This article explores a novel block reordering technique that can reduce server disk traffic significantly when large content files are shared. The idea is to transfer blocks to each client in any order that is convenient for the server. The server sends blocks to each client opportunistically in order to maximize the advantage from the disk reads it issues to serve other clients accessing the same file. We first illustrate the motivation and potential impact of aggressive block reordering using simple analytical models. Then we describe a file transfer system using a simple block reordering algorithm, called Circus. Experimental results with the Circus prototype show that it can improve server throughput by a factor of two or more in workloads with strong file access locality.

References

  1. Acharya, S., Franklin, M., and Zdonik, S. 1997. Balancing push and pull for data broadcast. In Proceedings of the ACM SIGMOD, 183--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allcock, B., Bester, J., Bresnahan, J., Chervenak, A. L., Foster, I., Kesselman, C., Meder, S., Nefedova, V., Quesnal, D., and Tuecke, S. 2002. Data management and transfer in high performance computational grid environments. Parallel Comput. J. 28, 5, 749--771. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Almeida, J. M., Krueger, J., Eager, D. L., and Vernon, M. K. 2001. Analysis of educational media server workloads. In Proceedings of the International Workshop on Network and Operating System Support for Digital Audio and Video, 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anastasiadis, S. V., Sevcik, K. C., and Stumm, M. 2001. Modular and efficient resource management in the exedra media server. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Anastasiadis, S. V., Wickremesinghe, R. G., and Chase, J. S. 2004. Circus: Opportunistic block reordering for scalable content servers. In Proceedings of the USENIX Conference on File and Storage Technologies, 201--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arlitt, M. F. and Williamson, C. L. 1996. Web server workload characterization: The search for invariants. In Proceedings of the ACM SIGMETRICS, 126--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the ACM Symposium on Operating Systems Principles, 198--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Barford, P. and Crovella, M. 1998. Generating representative Web workloads for network and server performance evaluation. In Proceedings of the ACM SIGMETRICS, 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brown, A. D., Mowry, T. C., and Krieger, O. 2001. Compiler-Based I/O prefetching for out-of-core applications. ACM Trans. Comput. Syst. 19, 2, 111--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Byers, J., Considine, J., Mitzenmacher, M., and Rost, S. 2002. Informed content delivery across adaptive overlay networks. In Proceedings of the ACM SIGCOMM, 47--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Byers, J. W., Luby, M., Mitzenmacher, M., and Rege, A. 1998. A digital fountain approach to reliable distribution of bulk data. In Proceedings of the ACM SIGCOMM, 57--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cao, P., Felten, E. W., Karlin, A., and Li, K. 1995. A study of integrated prefetching and caching strategies. In Proceedings of the SIGMETRICS/Peformance'95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chesire, M., Wolman, A., Voelker, G. M., and Levy, H. M. 2001. Measurement and analysis of a streaming-media workload. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Clark, D. D. and Tennenhouse, D. L. 1990. Architectural considerations for a new generation of protocols. In Proceedings of the ACM SIGCOMM, 200--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Coffman, K. and Odlyzko, A. M. 2002. Internet growth: Is there a “moore's law” for data traffic? In Proceedings of the Handbook of Massive Data Sets. Kluwer Academic, 47--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cohen, B. 2003. Incentives build robustness in bittorrent. bitconjurer.org.Google ScholarGoogle Scholar
  17. Diot, C. and Gagnon, F. 1999. Impact of out-of-sequence processing on the performance of data transmission. Comput. Netw. 31, 475--492.Google ScholarGoogle ScholarCross RefCross Ref
  18. Doyle, R. P., Chase, J. S., Gadde, S., and Vahdat, A. M. 2001. The trickle-down effect: Web caching and server request distribut ion. In Proceedings of the International Workshop on Web Caching and Content Delivery.Google ScholarGoogle Scholar
  19. Eager, D., Vernon, M., and Zahorjan, J. 2001. Minimizing bandwidth requirements for on-demand data delivery. IEEE Trans. Knowl. Data Eng. 13, 5, 742--757. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability. Freeman, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jin, S. and Bestavros, A. 2002. Scalability of multicast delivery for non-sequential streaming access. In Proceedings of the ACM SIGMETRICS, 97--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Luby, M. 2002. Lt codes. In Proceedings of the IEEE Symposium on Foundations of Computer Science, 271--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Megiddo, N. and Modha, D. S. 2003. Arc: A self-tuning, low overhead replacement cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Padhye, J., Firoiu, V., Towsley, D. F., and Kurose, J. F. 2000. Modeling TCP Reno performance: A simple model and its empirical validation. IEEE/ACM Trans. Netw. 8, 2, 133--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pai, V. S., Aron, M., Banga, G., Svendsen, M., Druschel, P., Zwaenepoel, W., and Nahum, E. 1998. Locality-Aware request distribution in cluster-based network servers. In Proceedings of the ACM ASPLOS, 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Park, K. and Pai, V. S. 2006. Scale and performance in the Coblitz large-file distribution service. In Proceedings of the USENIX Symposium on Networked Systems Design & Implementation, 29--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the ACM Symposium on Operating Systems Principles, 79--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Postel, J. and Reynolds, J. 1985. File transfer protocol (ftp). USC/ISI, Network Working Group RFC 959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Raman, S., Balakrishnan, H., and Srinivasan, M. 2000. An image transport protocol for the internet. In Proceedings of the International Conference on Network Protocols, 209--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rizzo, L. 1997. Dummynet: A simple approach to the evaluation of network protocol. ACM Commun. Rev. 47, 1, 31--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rost, S., Byers, J., and Bestavros, A. 2001. The cyclone server architecture: Streamlining delivery of popular content. In Proceedings of the International Workshop on Web Caching and Content Distribution. Boston, MA.Google ScholarGoogle Scholar
  32. Saroiu, S., Gummadi, P. K., Dunn, R. J., Gribble, S. D., and Levy, H. M. 2002. An analysis of internet content delivery systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, 315--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Saroiu, S., Gummadi, P. K., and Gribble, S. D. 2002. A measurement study of peer-to-peer file sharing systems. In Proceedings of the SPIE/ACM Multimedia Computing and Networking Conference.Google ScholarGoogle Scholar
  34. Steere, D. C. 1997. Exploiting the non-determinism and asynchrony of set iterators to reduce aggregate file I/O latency. In Proceedings of the ACM Symposium on Operating Systems Principles, 252--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Trivedi, K. S. 1982. Probability and Statistics with Reliability, Queuing and Computer Science Applications. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Vitter, J. S. and Krishnan, P. 1996. Optimal prefetching via data compression. J. ACM 43, 5, 771--793. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Vogels, W. 1999. File system usage in windows nt 4.0. In Proceedings of the ACM Symposium on Operating Systems Principles, 93--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wang, L., Pai, V. S., and Peterson, L. L. 2002. The effectiveness of request redirection on CDN robustness. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, 345--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhang, Y., Breslau, L., Paxson, V., and Shenker, S. 2002. On the characteristics and origins of internet flow rates. In Proceedings of the ACM SIGCOMM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rethinking FTP: Aggressive block reordering for large file transfers

      Recommendations

      Reviews

      Veronica Lagrange

      In the context of whole-file transfers, Anastasiadis et al. propose block reordering heuristics to maximize throughput by reducing disk traffic. First, already-cached blocks are transferred to all clients concurrently requesting that specific file. As a result, file blocks may be transferred out of order. Of course, the environments that benefit most from this heuristic are those where the disk is the bottleneck or where a big number of clients request the same file concurrently. The authors analyze in detail alternative methods to maximize throughput, such as optimizing cache and block sizes. They evaluated their heuristic with the help of a prototype built on top of the file transfer protocol (FTP) daemon of the FreeBSD R4.5 operating system; both client and server were modified to support block reordering. To test this prototype, a workload consisting of multiple clients was generated and divided into three groups, according to network link bandwidth: 1.544 megabits per second (Mb/s), 10 Mb/s, and 44.736 Mb/s. Then, they compared the execution results of their prototype, dubbed Circus, with those of an unmodified FreeBSD 4.5. In summary, as client requests (load) increase, Circus is better able to exploit network bandwidth. It is also capable of maintaining constant disk throughput and constant response times, while the standard software loses disk bandwidth and response times under the same circumstances. As file size increases, Circus is again better able to maintain network throughput and disk throughput. Overall, this paper makes a strong case for block reordering for the scenarios investigated. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 4, Issue 4
        January 2009
        116 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/1480439
        Issue’s Table of Contents

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 February 2009
        • Revised: 1 April 2008
        • Accepted: 1 April 2008
        • Received: 1 December 2007
        Published in tos Volume 4, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader