Abstract
Obtaining optimal data transfer performance is of utmost importance to today’s data-intensive distributed applications and wide-area data replication services. Doing so necessitates effectively utilizing available network bandwidth and resources, yet in practice transfers seldom reach the levels of utilization they potentially could. Tuning protocol parameters such as pipelining, parallelism, and concurrency can significantly increase utilization and performance, however determining the best settings for these parameters is a difficult problem, as network conditions can vary greatly between sites and over time. In this paper, we present four application-level algorithms for heuristically tuning protocol parameters for data transfers in wide-area networks. Our algorithms dynamically tune the number of parallel data streams per file, the level of control channel pipelining, and the number of concurrent file transfers to fill network pipes. The presented algorithms are implemented as a standalone service as well as being used in interaction with external data scheduling tools such as Stork. The experimental results are very promising, and our algorithms outperform existing solutions in this area.
Chapter PDF
Similar content being viewed by others
Keywords
References
Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswamy, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pickett, K., Tuecke, S.: Software as a service for data scientists. Communications of the ACM 55(2), 81–88 (2012)
Altman, E., Barman, D.: Parallel TCP sockets: Simple model, throughput and validation. In: Proceedings of IEEE INFOCOM (2006)
Bresnahan, J., Link, M., Kettimuthu, R., Fraser, D., Foster, I.: Gridftp pipelining. In: Proceedings of TeraGrid (2007)
Farkas, K., Huang, P., Krishnamurthy, B., Zhang, Y., Padhye, J.: Impact of TCP variants on HTTP performance. In: Proceedings of High Speed Networking, vol. 2 (2002)
Freed, N.: SMTP service extension for command pipelining, http://tools.ietf.org/html/rfc2920
Hacker, T.J., Noble, B.D., Athey, B.D.: Adaptive data block scheduling for parallel TCP streams. In: Proceedings of HPDC (2005)
Hacker, T.J., Noble, B.D., Atley, B.D.: The end-to-end performance effects of parallel TCP sockets on a lossy wide area network. In: Proc. of IPDPS (2002)
Khanna, G., Catalyurek, U., Kurc, T., Kettimuthu, R., Sadayappan, P., Foster, I., Saltz, J.: Using overlays for efficient data transfer over shared wide-area networks. In: Proceedings of SC, Piscataway, NJ, USA (2008)
Kim, J., Yildirim, E., Kosar, T.: A highly-accurate and low-overhead prediction model for transfer throughput optimization. In: Proceedings of ACM SC 2012 DISCS Workshop (2012)
Kosar, T.: A new paradigm in data intensive computing: Stork and the data-aware schedulers. In: Proceedings of IEEE HPDC 2006 CLADE Workshop (2006)
Kosar, T., Balman, M.: A new paradigm: Data-aware scheduling in grid computing. Future Generation Computing Systems 25(4), 406–413 (2009)
Kosar, T., Balman, M., Yildirim, E., Kulasekaran, S., Ross, B.: Stork data scheduler: Mitigating the data bottleneck in e-science. The Phil. Transactions of the Royal Society A 369(3254-3267) (2011)
Kosar, T., Livny, M.: Stork: Making data placement a first class citizen in the grid. In: Proceedings of ICDCS 2004, pp. 342–349 (March 2004)
Liu, W., Tieman, B., Kettimuthu, R., Foster, I.: A data transfer framework for large-scale science experiments. In: Proceedings of DIDC Workshop (2010)
LONI: Louisiana optical network initiative (LONI), http://www.loni.org/
Lu, D., Qiao, Y., Dinda, P.A., Bustamante, F.E.: Modeling and taming parallel TCP on the wide area network. In: Proceedings of IPDPS (2005)
Raiciu, C., Pluntke, C., Barre, S., Greenhalgh, A., Wischik, D., Handley, M.: Data center networking with multipath TCP. In: Proceedings of Hotnets-IX (2010)
XSEDE: Extreme Science and Engineering Discovery Environment, http://www.xsede.org/
Yildirim, E., Kim, J., Kosar, T.: How gridftp pipelining, parallelism and concurrency work: A guide for optimizing large dataset transfers. In: Proceedings of Network-Aware Data Management Workshop (NDM 2012) (November 2012)
Yildirim, E., Kim, J., Kosar, T.: Optimizing the sample size for a cloud-hosted data scheduling service. In: Proc. of IEEE/ACM CCGrid CCSA Workshop (2012)
Yildirim, E., Kosar, T.: Network-aware end-to-end data throughput optimization. In: Proceedings of Network-Aware Data Management Workshop (NDM 2011) (2011)
Yildirim, E., Yin, D., Kosar, T.: Prediction of optimal parallelism level in wide area data transfers. IEEE TPDS 22(12) (2011)
Yildirim, E., Yin, D., Kosar, T.: Balancing TCP buffer vs parallel streams in application level throughput optimization. In: Proceedings of DADC Workshop (2009)
Yin, D., Yildirim, E., Kosar, T.: A data throughput prediction and optimization service for widely distributed many-task computing. IEEE TPDS 22(6) (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arslan, E., Ross, B., Kosar, T. (2013). Dynamic Protocol Tuning Algorithms for High Performance Data Transfers. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_72
Download citation
DOI: https://doi.org/10.1007/978-3-642-40047-6_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)