Skip to main content
Log in

Optimizing transport protocol parameters for large scale PC cluster and its evaluation with parallel data mining

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Recently, PC clusters have come to be studied intensively for large scale parallel computers of the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore, an ATM-connected PC cluster is a promising platform from the cost/performance point of view, as a future high performance computing environment. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as for conventional scientific calculations. Thus, investigating the feasibility of applications on an ATM-connected PC cluster is meaningful. In this paper, an ATM-connected PC cluster consisting of 100 PCs is reported, and characteristics of a transport layer protocol for the PC cluster are evaluated. Point-to-point communication performance is measured and discussed, when a TCP window size parameter is changed. Parallel data mining is implemented and evaluated on the cluster. Retransmission caused by cell loss at the ATM switch is analyzed, and parameters of retransmission mechanism suitable for parallel processing on the large scale PC cluster are clarified. Default TCP protocol cannot provide good performance, since a lot of collisions happen during all-to-all multicasting executed on the large scale PC cluster. Using TCP parameters with the proposed optimization, performance improvement is achieved for parallel data mining on 100 PCs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Agrawal, T. Imielinski and A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (May 1993) pp. 207-216.

  2. R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in: Proceedings of the Twentieth International Conference on Very Large Data Bases (September 1994) pp. 487-499.

  3. A. Barak and O. La'adan, Performance of the MOSIX parallel system for a cluster of PC's, in: Proceedings of the HPCN Europe 1997 (April 1997) pp. 624-635.

  4. M. Blumrich, K. Li, R. Alpert, C. Dubnicki, E. Felten and J. Sandberg, Virtual memory mapped network interface for the SHRIMP multicomputer, in: Proceedings of the Twenty-First International Symposium on Computer Architecture (April 1994) pp. 142-153.

  5. R. Carter and J. Laroco, Commodity clusters: Performance comparison between PC's and workstations, in: Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing (August 1996) pp. 292-304.

  6. D.E. Culler, A.A. Dusseau, R.A. Dusseau, B. Chun, S. Lumetta, A. Mainwaring, R. Martin, C. Yoshikawa and F. Wong, Parallel computing on the Berkeley NOW, in: Proceedings of the 1997 Joint Symposium on Parallel Processing (JSPP '97) (May 1997) pp. 237-247.

  7. U.M. Fayyad, G.P. Shapiro, P. Smyth and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining (The MIT Press, Cambridge, MA, 1996).

    Google Scholar 

  8. J. Heinanen, Multiprotocol Encapsulation over ATM Adaptation Layer 5, RFC1483 (July 1993).

  9. C. Huang and P.K. McKinley, Communication issues in parallel computing across ATM networks, IEEE Parallel and Distributed Technology 2(4) (1994) 73-86.

    Article  Google Scholar 

  10. Information Networks Division, Hewlett-Packard Company, Netperf: A network performance benchmark, Revision 2.0, Technical Report, Hewlett-Packard Company (1995).

  11. M. Kitsuregawa, T. Tamura and M. Oguchi, Parallel database processing/data mining on large scale ATM connected PC cluster, in: Proceedings of the Euro-PDS '97 (April 1997) pp. 313-320.

  12. M. Laubach, Classical IP and ARP over ATM, RFC1577 (January 1994).

  13. R.S. Nikhil, G.M. Papadopoulos and Arvind, *T: A multithreaded massively parallel architecture, in: Proceedings of the Nineteenth International Symposium on Computer Architecture (May 1992) pp. 156-167.

  14. M. Oguchi, T. Shintani, T. Tamura and M. Kitsuregawa, Characteristics of a parallel data mining application implemented on an ATM connected PC cluster, in: Proceedings of the HPCN Europe 1997 (April 1997) pp. 303-317.

  15. T. Shintani and M. Kitsuregawa, Hash based parallel algorithms for mining association rules, in: Proceedings of the Fourth IEEE International Conference on Parallel and Distributed Information Systems (December 1996) pp. 19-30.

  16. T. Sterling, D. Saverese, D.J. Becker, B. Fryxell and K. Olson, Communication overhead for space science applications on the Beowulf parallel workstation, in: Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing (August 1995) pp. 23-30.

  17. T. Tamura, M. Oguchi and M. Kitsuregawa, Parallel database processing on a 100 node PC cluster: Cases for decision support query processing and data mining, in: Proceedings of SC97: High Performance Networking and Computing (SuperComputing '97) (November 1997).

  18. H. Tezuka, A. Hori, Y. Ishikawa and M. Sato, PM: An operating system coordinated high performance communication library, in: Proceedings of the HPCN Europe 1997 (April 1997) pp. 708-717.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oguchi, M., Kitsuregawa, M. Optimizing transport protocol parameters for large scale PC cluster and its evaluation with parallel data mining. Cluster Computing 3, 15–23 (2000). https://doi.org/10.1023/A:1019007615458

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019007615458

Keywords

Navigation