Skip to main content
Log in

A lightweight, high performance communication protocol for grid computing

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This paper describes a lightweight, high-performance communication protocol for the high-bandwidth, high-delay networks typical of computational Grids. One unique feature of this protocol is that it incorporates an extremely accurate classification mechanism that is efficient enough to diagnose the cause of data loss in real time, providing to the controller the opportunity to respond to different causes of data loss in different ways. The simplest adaptive response, and the one discussed in this paper, is to trigger aggressive congestion control measures only when the data loss is diagnosed as network related. However, even this very simple adaptation can have a tremendous impact on performance in a Grid setting where the resources allocated to a long-running, data-intensive application can fluctuate significantly during the course of its execution. In fact, we provide results showing that the utilization of the information provided by the classifier increased performance by over two orders of magnitude depending on the dominant cause of data loss. In this paper, we discuss the Bayesian statistical framework upon which the classifier is based and the classification metrics that make this approach highly successful. We discuss the integration of the classifier into the congestion control structures of an existing high-performance communication protocol, and provide empirical results showing that it correctly diagnosed the cause of data loss in over 98% of the experimental trials.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Allcock, W., Bester, J., Breshahan, J., Chervenak, A., et al.: Secure, efficient data transport and replica management for high-performance data-intensive computing. In: Proceedings of the IEEE Mass Storage Conference (2001)

  2. Allcock, W., Bresnahan, J., Kettimuthu, R., Link, J.: The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid. In: Proceedings of the Joint Workshop on High-Performance Grid Computing and High-Level Parallel Programming Models (April 2005)

  3. Allman, M., Paxson, V., Stevens, W.: TCP congestion control. http://www.faqs.org/rfcs/rfc2581.html

  4. Bardsley, W.G.: SimFit: A package for simulation, curve fitting, graph plotting and statistical analysis. http://www.simfit.man.ac.uk

  5. Barman, D., Matta, I.: Model-based loss inference by TCP over heterogeneous networks. In: WiOpt 2004, Cambridge, UK (2004)

  6. Biaz, S., Vaidya, N.: Distinguishing congestion losses from wireless losses: a negative result. In: Proceedings of the 7th International Conference on Computer Communications and Networks, Lafayette, LA (October 1998)

  7. Biaz, S., Vaidya, N.: Discriminating congestion losses from wireless losses using inter-arrival times at the receiver. In: Proceedings of the IEEE Symposium ASSET ’99 (1999)

  8. Biaz, S., Vaidya, N.: “De-randomizing” congestion losses to improve TCP performance over wired-wireless networks. ACM/IEEE Trans. Netw. 13(3), 596–608 (2005)

    Article  Google Scholar 

  9. Braden, B., et al.: RFC 2309. Recommendations on queue management and congestion avoidance in the Internet (April 1998)

  10. Brakmo, L., O”Malley, S., Peterson, L.: TCP Vegas: New techniques for congestion detection and avoidance. ACM/SIGCOMM Comput. Commun. Rev. 24, 24–35 (1994)

    Article  Google Scholar 

  11. Bresnahan, J., Link, M., Khanna, G., Imani, Z., et al.: Globus GridFTP: What’s new in 2007. In: Proceedings of the First International Conference on Networks for Grid Applications (2007)

  12. Cen, S., Cosman, P., Voelker, G.: End-to-end differentiation of congestion and wireless losses. IEEE/ACM Trans. Netw. 11(5), 703–717 (2003)

    Article  Google Scholar 

  13. D’Alessandro, G., Politi, A.: Hierarchical approach to complexity with applications to dynamical systems. Phys. Rev. Lett. 64(14), 1609–1612 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  14. Dickens, P.: A workstation-based direct execution simulator. In: Proceedings of the 11th Workshop on Parallel and Distributed Simulation (1997)

  15. Dickens, P.: A high performance file transfer mechanism for grid computing. In: Proceedings of the 2002 Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, Nevada (2002)

  16. Dickens, P.: FOBS: A lightweight communication protocol for grid computing. In: Proceedings of the Europar 2003 (2003)

  17. Dickens, P., Gropp, B.: An evaluation of object-based data transfers across high performance high delay networks. In: Proceedings of the 11th Conference on High Performance Distributed Computing, Edinburgh, Scotland (2002)

  18. Dickens, P., Larson, J.: Classifiers for the causes of data loss using packet-loss signatures. In: Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGRID ’04) (2004)

  19. Dickens, P., Heidelberger, P., Nicol, D.: Parallel simulation of multicomputer programs. ICASE Res. Q. 3(2) (1994)

  20. Dickens, P., Heidelberger, P., Nicol, D.: A distributed memory lapse: parallel simulation of message-passing programs. In: Proceedings of the 8th Workshop on Parallel and Distributed Systems, pp. 32–38 (1994)

  21. Dickens, P., Heidelberger, P., Nicol, D.: Parallelized network simulators for message-passing parallel programs. In: Proceedings of the MASCOTS’95. The Third International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. IEEE Comput. Soc., Los Alamitos (1995)

    Google Scholar 

  22. Dickens, P., Heidelberger, P., Nicol, D.: Parallelized direct execution simulation of message-passing parallel programs. IEEE Trans. Parallel Distrib. Syst. 7(10), 1090–1105 (1996)

    Article  Google Scholar 

  23. Dickens, P., Nicol, D., Reynolds, P., Duva, J.: Analysis of bounded time warp and a comparison with YAWNS. ACM Trans. Model. Comput. Simul. 6(4), 297–320 (1996)

    Article  MATH  Google Scholar 

  24. Dickens, P., Gropp, B., Woodward, P.: High performance wide area data transfers over high performance networks. In: Proceedings of the 2002 International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (2002)

  25. Dickens, P., Larsen, J., Nicol, D.: Diagnostics for causes of packet loss in a high performance data transfer system. In: Proceedings of the 2004 IPDPS Conference: The 18th International Parallel and Distributed Processing Symposium, Santa Fe, New Mexico (2004)

  26. Elsner, J., Tsonis, A.: Complexity and predictability of hourly precipitation. J. Atmos. Sci. 50(3), 400–405 (1993)

    Article  Google Scholar 

  27. Floyd, S.: Modifying TCP’s congestion control for high speeds. http://www.aciri.org/floyd

  28. Fonseca, N., Crovella, M.: Bayesian packet loss detection for TCP. In: Proceedings of the IEEE Infocom2005 (2005)

  29. Gu, Y., Grossman, R.: Optimizing UDP-based Protocol Implementations. In: Proceedings of the Third International Workshop on Protocols for Fast Long-Distance Networks (PFLDNet 2005), Lyon, France (2005)

  30. Gu, Y., Grossman, R.: Supporting configurable congestion control in data transport services. In: Proceedings of the SC05 (2005)

  31. Gu, Y., Grossman, R.: UDT: UDP-based data transfer for high-speed wide area networks. Comput. Netw. Int. J. Comput. Telecommun. Netw. 51(7), 1777–1799 (2007)

    MATH  Google Scholar 

  32. Gu, Y., Hong, X., Grossman, R.L.: Experiences in design and implementation of a high performance transport protocol. In: Proceedings of the SC 2004, Pittsburgh, PA

  33. Hacker, T., Noble, B., Athey, B.: Improving throughput and maintaining fairness using parallel TCP. In: Proceedings of the IEEE INFOCOM ’04 (2004)

  34. Handley, M., Floyd, S., Padhye, J., Widmer, J.: [RFC 3448] TCP friendly rate control (TFRC): Protocol specification. http://community.roxen.com/developers/idocs/rfc/rfc3448.html

  35. Hao, B.-L.: Elementary Symbolic Dynamics and Chaos in Dissipative Systems. World Scientific, Singapore (1988)

    Google Scholar 

  36. He, E., Leigh, J., Yu, O., DeFanti, T.: Reliable blast UDP: predictable high performance bulk data transfer. In: Proceedings of the IEEE Cluster Computing, Chicago, Illinois (2002)

  37. Hegde, S., Lapsley, D., Wydrowski, B., Lindheim, J., et al.: FAST TCP in high speed networks: an experimental study. In: Proceedings of the GridNets, San Jose, CA (2004)

  38. Herreria-Alonso, S., Rodriquez-Perez, M., Suarez-Gonzalez, A., Fernandez-Veiga, M., et al.: Improving TCP Vegas fairness in presence of backward traffic. IEEE Commun. Lett. 11(3) (2007)

  39. HighSpeed TCP for large congestion windows. www.ietf.org/rfc/rfc3649.txt

  40. I-WIRE: The Illinois wired and wireless infrastructure for research and education. http://www.iwire.org

  41. Jacobson, V., Braden, R., Borman, D.: RFC [1323] TCP extensions for high performance (1992)

  42. Jin, C., Wei, D., Low, H., Buhrmaster, G., et al.: FAST TCP: from theory to experiments. IEEE Netw. (2005)

  43. Kelly, T., Scalable, T.C.P.: Improving performance in highspeed wide area networks. Comput. Commun. Rev. 32(2) (2003)

  44. Koo, J., Mun, S., Choo, H.: TCP WestwoodVT: A novel technique for discriminating the cause of packet loss in wireless networks. In: NETWORKING 2007. LNCS, vol. 4479, pp. 391–402 (2007)

  45. Liu, J., Crovella, M.: Using loss pairs to discover network properties. In: Proceedings of the ACM SIGCOMM Internet Measurement Workshop 2001, San Francisco, California (2001)

  46. Liu, J., Matta, I., Crovella, M.: End-to-end inference of loss nature in a hybrid wired/wireless environment. In: Proceedings of the Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt ’03), Sophia-Antipolis, France (2003)

  47. LMbench. http://www.bitmover.com/lmbench/

  48. Mathis, M., Heffner, J., Reddy, R.: Web100: Extended TCP instrumentation for research, education and diagnosis. ACM Comput. Commun. Rev. 33(3) (2003)

  49. Net100: Development of network aware operating systems. http://www.csm.ornl.gov/~dunigan/net100/

  50. Ostermann, S., Allman, M., Kruse, H.: An application-level solution to TCP’s satellite inefficiencies. In: Proceedings of the Workshop on Satellite-Based Information Services (WOSBIS) (1996)

  51. Postel, J., Reynolds, J.: RFC 959—file transfer protocol. http://www.w3.org/Protocols/rfc959/

  52. Sivakumar, H., Bailey, S., Grossman, R.: PSockets: The case for application-level network striping for data intensive applications using high speed wide area networks. In: Proceedings of the Super Computing 2000 (SC2000) (2000)

  53. The Globus Alliance. http://www.globus.org

  54. The Teragrid Project. http://www.teragrid.org

  55. Tsunami home page (associated with the Advanced Network Management Lab at the University of Indiana). http://www.indiana.edu/~anml/anmlresearch.html

  56. UDT Software Release. http://sourceforge.net/projects/UDT

  57. Vinkat, R., Dickens, P., Gropp, B.: Efficient communication across the Internet in wide-area MPI. In: Proceedings of the 2001 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), Las Vegas, Nevada (2001)

  58. Wefel, P.: Personal communication (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phillip M. Dickens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dickens, P.M. A lightweight, high performance communication protocol for grid computing. Cluster Comput 13, 47–66 (2010). https://doi.org/10.1007/s10586-009-0107-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-009-0107-x

Navigation