research-article

Rethinking FTP: Aggressive block reordering for large file transfers

Authors:
Stergios V. Anastasiadis

University of Ioannina, Greece

University of Ioannina, Greece
View Profile

,
Rajiv G. Wickremesinghe

Oracle Corporation, Redwood Shores, CA

Oracle Corporation, Redwood Shores, CA
View Profile

,
Jeffrey S. Chase

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 4 Issue 4Article No.: 13pp 1–27https://doi.org/10.1145/1480439.1480442

Published:09 February 2009Publication History

ACM Transactions on Storage

Abstract

Whole-file transfer is a basic primitive for Internet content dissemination. Content servers are increasingly limited by disk arm movement, given the rapid growth in disk density, disk transfer rates, server network bandwidth, and content size. Individual file transfers are sequential, but the block access sequence on a content server is effectively random when many slow clients access large files concurrently. Although larger blocks can help improve disk throughput, buffering requirements increase linearly with block size.

This article explores a novel block reordering technique that can reduce server disk traffic significantly when large content files are shared. The idea is to transfer blocks to each client in any order that is convenient for the server. The server sends blocks to each client opportunistically in order to maximize the advantage from the disk reads it issues to serve other clients accessing the same file. We first illustrate the motivation and potential impact of aggressive block reordering using simple analytical models. Then we describe a file transfer system using a simple block reordering algorithm, called Circus. Experimental results with the Circus prototype show that it can improve server throughput by a factor of two or more in workloads with strong file access locality.

References

Acharya, S., Franklin, M., and Zdonik, S. 1997. Balancing push and pull for data broadcast. In Proceedings of the ACM SIGMOD, 183--194. Google ScholarDigital Library
Allcock, B., Bester, J., Bresnahan, J., Chervenak, A. L., Foster, I., Kesselman, C., Meder, S., Nefedova, V., Quesnal, D., and Tuecke, S. 2002. Data management and transfer in high performance computational grid environments. Parallel Comput. J. 28, 5, 749--771. Google ScholarDigital Library
Almeida, J. M., Krueger, J., Eager, D. L., and Vernon, M. K. 2001. Analysis of educational media server workloads. In Proceedings of the International Workshop on Network and Operating System Support for Digital Audio and Video, 21--30. Google ScholarDigital Library
Anastasiadis, S. V., Sevcik, K. C., and Stumm, M. 2001. Modular and efficient resource management in the exedra media server. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, 25--36. Google ScholarDigital Library
Anastasiadis, S. V., Wickremesinghe, R. G., and Chase, J. S. 2004. Circus: Opportunistic block reordering for scalable content servers. In Proceedings of the USENIX Conference on File and Storage Technologies, 201--212. Google ScholarDigital Library
Arlitt, M. F. and Williamson, C. L. 1996. Web server workload characterization: The search for invariants. In Proceedings of the ACM SIGMETRICS, 126--137. Google ScholarDigital Library
Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the ACM Symposium on Operating Systems Principles, 198--212. Google ScholarDigital Library
Barford, P. and Crovella, M. 1998. Generating representative Web workloads for network and server performance evaluation. In Proceedings of the ACM SIGMETRICS, 151--160. Google ScholarDigital Library
Brown, A. D., Mowry, T. C., and Krieger, O. 2001. Compiler-Based I/O prefetching for out-of-core applications. ACM Trans. Comput. Syst. 19, 2, 111--170. Google ScholarDigital Library
Byers, J., Considine, J., Mitzenmacher, M., and Rost, S. 2002. Informed content delivery across adaptive overlay networks. In Proceedings of the ACM SIGCOMM, 47--60. Google ScholarDigital Library
Byers, J. W., Luby, M., Mitzenmacher, M., and Rege, A. 1998. A digital fountain approach to reliable distribution of bulk data. In Proceedings of the ACM SIGCOMM, 57--67. Google ScholarDigital Library
Cao, P., Felten, E. W., Karlin, A., and Li, K. 1995. A study of integrated prefetching and caching strategies. In Proceedings of the SIGMETRICS/Peformance'95. Google ScholarDigital Library
Chesire, M., Wolman, A., Voelker, G. M., and Levy, H. M. 2001. Measurement and analysis of a streaming-media workload. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, 1--12. Google ScholarDigital Library
Clark, D. D. and Tennenhouse, D. L. 1990. Architectural considerations for a new generation of protocols. In Proceedings of the ACM SIGCOMM, 200--208. Google ScholarDigital Library
Coffman, K. and Odlyzko, A. M. 2002. Internet growth: Is there a “moore's law” for data traffic&quest; In Proceedings of the Handbook of Massive Data Sets. Kluwer Academic, 47--93. Google ScholarDigital Library
Cohen, B. 2003. Incentives build robustness in bittorrent. bitconjurer.org.Google Scholar
Diot, C. and Gagnon, F. 1999. Impact of out-of-sequence processing on the performance of data transmission. Comput. Netw. 31, 475--492.Google ScholarCross Ref
Doyle, R. P., Chase, J. S., Gadde, S., and Vahdat, A. M. 2001. The trickle-down effect: Web caching and server request distribut ion. In Proceedings of the International Workshop on Web Caching and Content Delivery.Google Scholar
Eager, D., Vernon, M., and Zahorjan, J. 2001. Minimizing bandwidth requirements for on-demand data delivery. IEEE Trans. Knowl. Data Eng. 13, 5, 742--757. Google ScholarDigital Library
Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability. Freeman, New York. Google ScholarDigital Library
Jin, S. and Bestavros, A. 2002. Scalability of multicast delivery for non-sequential streaming access. In Proceedings of the ACM SIGMETRICS, 97--107. Google ScholarDigital Library
Luby, M. 2002. Lt codes. In Proceedings of the IEEE Symposium on Foundations of Computer Science, 271--282. Google ScholarDigital Library
Megiddo, N. and Modha, D. S. 2003. Arc: A self-tuning, low overhead replacement cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). Google ScholarDigital Library
Padhye, J., Firoiu, V., Towsley, D. F., and Kurose, J. F. 2000. Modeling TCP Reno performance: A simple model and its empirical validation. IEEE/ACM Trans. Netw. 8, 2, 133--145. Google ScholarDigital Library
Pai, V. S., Aron, M., Banga, G., Svendsen, M., Druschel, P., Zwaenepoel, W., and Nahum, E. 1998. Locality-Aware request distribution in cluster-based network servers. In Proceedings of the ACM ASPLOS, 205--216. Google ScholarDigital Library
Park, K. and Pai, V. S. 2006. Scale and performance in the Coblitz large-file distribution service. In Proceedings of the USENIX Symposium on Networked Systems Design & Implementation, 29--44. Google ScholarDigital Library
Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the ACM Symposium on Operating Systems Principles, 79--95. Google ScholarDigital Library
Postel, J. and Reynolds, J. 1985. File transfer protocol (ftp). USC/ISI, Network Working Group RFC 959. Google ScholarDigital Library
Raman, S., Balakrishnan, H., and Srinivasan, M. 2000. An image transport protocol for the internet. In Proceedings of the International Conference on Network Protocols, 209--219. Google ScholarDigital Library
Rizzo, L. 1997. Dummynet: A simple approach to the evaluation of network protocol. ACM Commun. Rev. 47, 1, 31--41. Google ScholarDigital Library
Rost, S., Byers, J., and Bestavros, A. 2001. The cyclone server architecture: Streamlining delivery of popular content. In Proceedings of the International Workshop on Web Caching and Content Distribution. Boston, MA.Google Scholar
Saroiu, S., Gummadi, P. K., Dunn, R. J., Gribble, S. D., and Levy, H. M. 2002. An analysis of internet content delivery systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, 315--328. Google ScholarDigital Library
Saroiu, S., Gummadi, P. K., and Gribble, S. D. 2002. A measurement study of peer-to-peer file sharing systems. In Proceedings of the SPIE/ACM Multimedia Computing and Networking Conference.Google Scholar
Steere, D. C. 1997. Exploiting the non-determinism and asynchrony of set iterators to reduce aggregate file I/O latency. In Proceedings of the ACM Symposium on Operating Systems Principles, 252--263. Google ScholarDigital Library
Trivedi, K. S. 1982. Probability and Statistics with Reliability, Queuing and Computer Science Applications. Prentice-Hall, Englewood Cliffs, NJ. Google ScholarDigital Library
Vitter, J. S. and Krishnan, P. 1996. Optimal prefetching via data compression. J. ACM 43, 5, 771--793. Google ScholarDigital Library
Vogels, W. 1999. File system usage in windows nt 4.0. In Proceedings of the ACM Symposium on Operating Systems Principles, 93--109. Google ScholarDigital Library
Wang, L., Pai, V. S., and Peterson, L. L. 2002. The effectiveness of request redirection on CDN robustness. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, 345--360. Google ScholarDigital Library
Zhang, Y., Breslau, L., Paxson, V., and Shenker, S. 2002. On the characteristics and origins of internet flow rates. In Proceedings of the ACM SIGCOMM. Google ScholarDigital Library

Index Terms

Rethinking FTP: Aggressive block reordering for large file transfers
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
2. Networks
  1. Network protocols

Recommendations

A Distributed File Transfer Protocol based on P-FTP
AsiaCSN '08: Proceedings of the Fifth IASTED International Conference on Communication Systems and Networks

In this paper, we propose a Distributed File Transfer Protocol (DFTP) which is used to reduce the file download time. DFTP finds suitable mirror servers by the client itself, and calculates the size of transmission subfile for each found mirror server. ...
Read More
A Cost-effective Near-line Storage Server for Multimedia System
ICDE '95: Proceedings of the Eleventh International Conference on Data Engineering

We consider a storage server architecture for multimedia information systems. While most other works on multimedia storage servers assume on-line disk storage, we consider a two-tier storage architecture with a robotic tape library as the vast near-line ...
Read More
DotDFS: A Grid-based high-throughput file transfer system

DotGrid platform is a Grid infrastructure integrated with a set of open and standard protocols recently implemented on the top of Microsoft .NET in Windows and MONO .NET in UNIX/Linux. DotGrid infrastructure along with its proposed protocols provides a ...
Read More

Reviews

Reviewer: Veronica Lagrange

In the context of whole-file transfers, Anastasiadis et al. propose block reordering heuristics to maximize throughput by reducing disk traffic. First, already-cached blocks are transferred to all clients concurrently requesting that specific file. As a result, file blocks may be transferred out of order. Of course, the environments that benefit most from this heuristic are those where the disk is the bottleneck or where a big number of clients request the same file concurrently. The authors analyze in detail alternative methods to maximize throughput, such as optimizing cache and block sizes. They evaluated their heuristic with the help of a prototype built on top of the file transfer protocol (FTP) daemon of the FreeBSD R4.5 operating system; both client and server were modified to support block reordering. To test this prototype, a workload consisting of multiple clients was generated and divided into three groups, according to network link bandwidth: 1.544 megabits per second (Mb/s), 10 Mb/s, and 44.736 Mb/s. Then, they compared the execution results of their prototype, dubbed Circus, with those of an unmodified FreeBSD 4.5. In summary, as client requests (load) increase, Circus is better able to exploit network bandwidth. It is also capable of maintaining constant disk throughput and constant response times, while the standard software loses disk bandwidth and response times under the same circumstances. As file size increases, Circus is again better able to maintain network throughput and disk throughput. Overall, this paper makes a strong case for block reordering for the scenarios investigated. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Storage Volume 4, Issue 4
January 2009
116 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/1480439
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 February 2009
- Revised: 1 April 2008
- Accepted: 1 April 2008
- Received: 1 December 2007
Published in tos Volume 4, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Disk access
file transfer protocols
scheduling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 566
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Rethinking FTP: Aggressive block reordering for large file transfers

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

A Distributed File Transfer Protocol based on P-FTP

A Cost-effective Near-line Storage Server for Multimedia System

DotDFS: A Grid-based high-throughput file transfer system

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Rethinking FTP: Aggressive block reordering for large file transfers

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

A Distributed File Transfer Protocol based on P-FTP

A Cost-effective Near-line Storage Server for Multimedia System

DotDFS: A Grid-based high-throughput file transfer system

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media