Abstract
This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data clustering. The divide-and-conquer approach to parallel data clustering is employed to perform the coarse-grain parallel processing by multiple PCs with a message passing mechanism. By taking advantage of the GPU’s parallel processing capability, moreover, the proposed scheme can exploit two types of the fine-grain data parallelism at the different levels in the nearest neighbor search, which is the most computationally-intensive part of the data-clustering process. The performance of our scheme is discussed in comparison with that of the implementation entirely running on CPU. Experimental results clearly show that the proposed hierarchial parallel processing can remarkably accelerate the data clustering task. Especially, GPU co-processing is quite effective to improve the computational efficiency of parallel data clustering on a PC cluster. Although data-transfer from GPU to CPU is generally costly, acceleration by GPU co-processing is significant to save the total execution time of data-clustering.
Article PDF
Similar content being viewed by others
References
Abbas HM, Bayoumi MM (2002) Parallel Codebook Design for Vector Quantization on a Message Passing MIMD architecture. Parallel Computing 28:1079–1093
Anderberg M (1973) Cluster Analysis for Applications. Academic Press Inc, NY and London
Bohn C-A (1998) Kohonen Feature Mapping Through Graphics Hardware. Computational Intelligence and Neuroscience
Buck I (2005) Taking the Plunge into GPU Computing. In: GPU Gems 2: Programming Techniques for High-performance Graphics and General-purpose Computation. Addison-Wesley, pp 509–519
Everitt B, Landau S, Leese M (2001) Cluster Analysis. 4th edn. Oxford University Press Inc., NY
Fan Z, Qiu F, Kaufman A, Yoakum-Stover S (2004) GPU Cluster for High Performance Computing. In: The ACM/IEEE SC2004 Conference (SC’04)
Fayyad U, Haussler D, Stolorz P (1996) KDD for Science Data Analysis: Issues and Examples. In: The Second International Conference on Knowledge Discovery and Data mining (KDD-96)
Forgy E (1965) Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classification. Biometrics 21:768–769 (Abstract)
Gersho A, Gray R (1992) Vector Quantization and Signal Compression. Kluwer, Norwell, MA
Gierlinger T, Prabhu P (2004) Towards Load Balanced Computations using GPU and CPU. In: 2004 ACM Workshop on General-Purpose Computing on Graphics Processors (GP2), p C-14
Hall JD, Hart J (2004) GPU Acceleration of Interactive Clustering. In: ACM Workshop on General-Purpose Computing on Graphics Processors, p C-6
Kilgariff E, Fernando R (2005) The GeForce 6 series GPU architecture. In: GPU Gems 2: Programming Techniques for High-performance graphics and general-purpose computation. Addison-Wesley, pp 471–491
Kobayashi K, Kiyoshita M, Onodera H, Tamaru K (1997) A Memory-based Parallel Processor for Vectror Quantization: FMPP-VQ. IEICE Trans. Electron E80-C(7):970–975
Kohonen T (1995) Self-Organizing Maps. Springer-Verlag, New York
Linde Y, Buzo A, Gray R (1980) An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications COM-28(1):84–95
Macedonia M (2003) The GPU enters computing’s mainstream. IEEE Computers 36(10):106–108
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: The Fifth Berkley Symposium on Mathematical Statistics and Probability vol 1, pp 281–297
Manohar M, Tilton J (1996) Progressive Vector Quantization on a Massively Parallel SIMD machine with Application to Multispectral Image Data. IEEE Transactions on Image Processing 5(1):142–147
Moreland K, Angel E (2003) The FFT on a GPU. In: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003 Proceedings, pp 112–119
Murty M, Krishna G (1980) A computationally efficient technique for data clustering. Pattern Recognition 12:153–158
Owens J (2005) Streaming Architextures and Technology Trends. In: GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley, pp 457–470
Parhi K, Wu F, Genesan K (1994) Sequential and Parallel Neural Network Vector Quantizers. IEEE Trans. Computers 43(1):104–109
Patané G, Russo M (2001) The Enhanced LBG Algorithm. Neural Networks 14:1219–1237
Takizawa H, Kobayashi H (2004) Multi-Grain Parallel Processing of Data Clustering on Programmable Graphics Hardware. In: The 2nd International Symposium on Parallel and Distributed Processing and Applications (ISPA’04), 3358:16–27
Thompson CJ, Hahn S, Oskin M (2002) Using Modern Graphics Architectures for General-Purpose Computing: A Fram ework and Analysis’. In: International Symposium on Microarchitecture(MICRO), Turkey.
Woo M, Neider J, Davis T, Shreiner D (1999) OpenGL Programming Guide: The Official Guide to Learning openGL, Version 1.2, 3rd edn. Addison-Wesley
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Takizawa, H., Kobayashi, H. Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing. J Supercomput 36, 219–234 (2006). https://doi.org/10.1007/s11227-006-8294-1
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-8294-1