Abstract
This paper presents an effective scheme for clustering a huge data set using a commodity programmable graphics processing unit(GPU). Due to GPU’s application-specific architecture, one of the current research issues is how to bind the rendering pipeline with the data-clustering process. By taking advantage of GPU’s parallel processing capability, our implementation scheme is devised to exploit the multi-grain single-instruction multiple-data (SIMD) parallelism of the nearest neighbor search, which is the most computationally-intensive part of the data-clustering process. The performance of our scheme is discussed in comparison with that of the implementation entirely running on CPU. Experimental results clearly show that the parallelism of the nearest neighbor search allows our scheme to efficiently execute the data-clustering process. Although data-transfer from GPU to CPU is generally costly, acceleration by GPU is significant to save the total execution time of data-clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderberg, M.R.: Cluster Analysis for Applications. Academic Press Inc., London (1973)
Kohonen, T.: Self-Organizing Maps. Springer, New York (1995)
Fayyad, U., Haussler, D., Stolorz, P.: KDD for science data analysis: Issues and examples. In: The Second International Conference on Knowledge Discovery and Data mining (KDD 1996). AAAI Press, Menlo Park (1996)
Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer Academic Publishers, Norwell (1992)
Everitt, B., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Oxford University Press Inc., NY (2001)
Kobayashi, K., Kiyoshita, M., Onodera, H., Tamaru, K.: A memory-based parallel processor for vectror quantization: FMPP-VQ. IEICE Trans. Electron. E80-C, 970–975 (1997)
Abbas, H.M., Bayoumi, M.M.: Parallel codebook design for vector quantization on a message passing MI MD architecture. Parallel Computing 28, 1079–1093 (2002)
Parhi, K., Wu, F., Genesan, K.: Sequential and parallel neural network vector quantizers. IEEE trans. Computers 43, 104–109 (1994)
Manohar, M., Tilton, J.: Progressive vector quantization on a massively parallel SIMD machine with application to multispectral image data. IEEE transactions on Image Processing 5, 142–147 (1996)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: The fifth Berkley Symposium on Mathematical Statistics and Probability, Berkley, vol. 1, pp. 281–297. The University of California Press (1967)
Thompson, C.J., Hahn, S., Oskin, M.: Using modern graphics architectures for general-purpose computing: A framework and analysis. In: International Symposium on Microarchitecture(MICRO), Turkey (2002)
Moreland, K., Angel, E.: The FFT on a GPU. In SIGGRAPH/Eurographics Workshop on Graphics Hardware 2003 Proceedings, pp. 112–119 (2003)
Bohn, C.A.: Kohonen feature mapping through graphics hardware. Computational Intelligence and Neuroscience (1998)
NVIDIA Corporation: GeForce 6800 product web site (2004), http://www.nvidia.com/page/geforce_6800.html
Forgy, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classification. Biometrics 21, 768–769 (1965) (Abstract)
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Transactions on Communications COM-28, 84–95 (1980)
Patané, G., Russo, M.: The enhanced LBG algorithm. Neural Networks 14, 1219–1237 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takizawa, H., Kobayashi, H. (2004). Multi-grain Parallel Processing of Data-Clustering on Programmable Graphics Hardware. In: Cao, J., Yang, L.T., Guo, M., Lau, F. (eds) Parallel and Distributed Processing and Applications. ISPA 2004. Lecture Notes in Computer Science, vol 3358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30566-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-30566-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24128-7
Online ISBN: 978-3-540-30566-8
eBook Packages: Computer ScienceComputer Science (R0)