Skip to main content
Log in

A highly-accurate and low-overhead prediction model for transfer throughput optimization

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

An important bottleneck for data-intensive scalable computing systems is efficient utilization of the network links that connect the collaborating institutions with their remote partners, data sources, and computational sites. To alleviate this bottleneck, we propose an application-layer throughput optimization model based on parallel stream number prediction. This new model extends our two previous models (Partial C-order and Full Second-order) to achieve higher accuracy and lower overhead predictions. Our new model, called Full C-order, outperforms both of our previous models as well as the three most relevant models by others (the Partial Second-order, Hacker et al., and Altman et al. models) in terms of both accuracy and efficiency. We test and compare these six models on emulated testbeds and on production environments using a wide variety of data set sizes, RTT, and bandwidth combinations. Our comprehensive experiments confirm the superiority of our new model to the other five models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. XSEDE. http://www.xsede.org/

  2. Energy sciences network (ESNet). http://www.es.net/

  3. Internet2. http://www.internet2.edu/

  4. Arra/ani testbed. https://sites.google.com/a/lbl.gov/ani-100g-network

  5. Garfienkel, S.: An evaluation of Amazon’s Grid computing services: EC2, S3 and SQS. Tech. Rep. TR-08-07, Aug. 2007

  6. Cho, B., Gupta, I.: Budget-constrained bulk data transfer via Internet and shipping networks. In: The 8th International Conference on Autonomic Computing (ICAC) (2011)

    Google Scholar 

  7. Sivakumar, H., Bailey, S., Grossman, R.L.: Psockets: the case for application-level network striping for data intensive applications using high speed wide area networks. In: Proc. of Supercomputing (2000)

    Google Scholar 

  8. Lee, J., Gunter, D., Tierney, B., Allcock, B., Bester, J., Bresnahan, J., Tuecke, S.: Applied techniques for high bandwidth data transfers across wide area networks. In: Proc. International Conference on Computing in High Energy and Nuclear Physics (CHEP01) (2001)

    Google Scholar 

  9. Balakrishman, H., Padmanabhan, V.N., Seshan, S., Stemm, R.H.K.M.: Tcp behavior of a busy Internet server: analysis and improvements. In: Proc. of INFOCOM (1998)

    Google Scholar 

  10. Hacker, T.J., Noble, B.D., Atley, B.D.: The end-to-end performance effects of parallel tcp sockets on a lossy wide area network. In: Proc. of IPDPS (2002)

    Google Scholar 

  11. Eggert, L., Heideman, J., Tough, J.: Effects of ensemble tcp. ACM Comput. Commun. Rev. 30(1), 15–29 (2000)

    Article  Google Scholar 

  12. Kola, G., Kosar, T., Livny, M.: Run-time adaptation of grid data-placement jobs. Scalable Comput., Pract. Exp. 6(3), 33–43 (2005)

    Google Scholar 

  13. Karrer, R.P., Park, J., Kim, J.: Adaptive data block scheduling for parallel streams. Tech. Report, vol. 17(2) (2006)

  14. Yildirim, E., Suslu, I.H., Kosar, T.: Which network measurement tool is right for you? A multidimensional comparison study. In: Proc. of the 2008 9th IEEE/ACM International Conference on Grid Computing (GRID’08), Sep. 2008

    Google Scholar 

  15. Lu, D., Qiao, Y., Dinda, P.A.: Characterizing and predicting tcp throughput on the wide area network. In: Proc. IEEE International Conference on Distributed Computing Systems (ICDCS05) (2005)

    Google Scholar 

  16. Yildirim, E., Yin, D., Kosar, T.: Prediction of optimal parallelism level in wide area data transfers. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22(12) (2011)

  17. Yin, D., Yildirim, E., Kosar, T.: A data throughput prediction and optimization service for widely distributed many-task computing. IEEE Trans. Parallel Distrib. Syst. 22(6) (2011)

  18. Allcock, W.: Gridftp protocol specification. GGF (2003)

  19. Yildirim, E., Kosar, T.: Network-aware end-to-end data throughput optimization. In: Proc. of the Network-Aware Data Management Workshop (NDM 2011) (2012)

    Google Scholar 

  20. Lu, D., Qiao, Y., Dinda, P.A., Bustamante, F.E.: Modeling and taming parallel tcp on the wide area network. In: Proc. of IPDPS (2005)

    Google Scholar 

  21. Altman, E., Barman, D., Tuffin, B., Vojnovic, M.: Parallel tcp sockets: simple model, throughput and validation. In: Proc. IEEE Conference on Computer Communications (INFOCOM06) (2006)

    Google Scholar 

  22. Emulab-network emulation testbed. [Online]. Available http://www.emulab.net/

  23. Cron—10 gbps high-speed network emulation testbed. [Online]. Available http://cron.loni.org

  24. Louisiana optical network initiative (LONI). http://www.loni.org/

  25. Crowcroft, J., Oechslin, P.: Differentiated end-to-end Internet services using a weighted proportional fair sharing tcp. ACM SIGCOMM Comput. Commun. Rev. 28(3), 53–69 (1998)

    Article  Google Scholar 

  26. Kola, G., Vernon, M.K.: Target bandwidth sharing using endhost measures. Perform. Eval. 64(9–12), 948–964 (2007)

    Article  Google Scholar 

  27. Yildirim, E., Kim, J., Kosar, T.: Optimizing the sample size for a cloud-hosted data scheduling service. In: Proc. 2nd International Workshop on Cloud Computing and Scientific Applications (CCSA in Conjunction with CCGRID’12) (2012)

    Google Scholar 

  28. Mathis, M., Heffner, J., Reddy, R.: Web100: extended tcp instrumentation for research, education and diagnosis. ACM Comput. Commun. Rev. 33(3) (2003)

  29. Kosar, T., Livny, M.: Stork: making data placement a first class citizen in the grid. In: Proceedings of ICDCS’04, pp. 342–349, March 2004

    Google Scholar 

  30. Kosar, T., Balman, M.: A new paradigm: data-aware scheduling in grid computing. Future Gener. Comput. Syst. 25(4), 406–413 (2009)

    Article  Google Scholar 

  31. Kosar, T., Balman, M., Yildirim, E., Kulasekaran, S., Ross, B.: Stork data scheduler: mitigating the data bottleneck in e-science. Philos. Trans. R. Soc. Lond. A 369, 3254–3267 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

This project is partially supported by NSF under award numbers CNS-1131889 (CAREER), OCI-0926701 (STCI-Stork), and CCF-1115805 (CiC-Stork). In addition, this project has used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by NSF under grant number OCI-1053575. We also would like to thank Brandon Ross for his contributions during the Stork implementation of the Full C-order Model.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to JangYoung Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, J., Yildirim, E. & Kosar, T. A highly-accurate and low-overhead prediction model for transfer throughput optimization. Cluster Comput 18, 41–59 (2015). https://doi.org/10.1007/s10586-013-0305-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0305-4

Keywords

Navigation