Abstract
K-Means clustering algorithm is a typical partition-based clustering algorithm. Its two major disadvantages lie in the facts that the algorithm is sensitive to initial cluster centers and the outliers exert significant influence on the clustering results. In addition, K-Means algorithm traverses and computes all the data multiple times. Thus, the algorithm is not efficient when dealing with large data sets. In order to overcome the above limitations, this paper proposes to exclude the outliers using the minimum number of points in the d-dimensional hypersphere area. Then k cluster centers can be obtained by adjusting the threshold making use of density idea. Finally, K-Means algorithm will be integrated with Compute Unified Device Architecture (CUDA). The time efficiency is improved considerably through taking advantage of computing power of Graphic Processing Unit (GPU). We use the ratio of distance between classes to distance within classes and speedup as the evaluation criteria. The experiments indicate that the proposed algorithm significantly improves the stability and running efficiency of K-Means algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, C.F., Tang, Y.Z.: Research of K-means Algorithm Combined with Neighbors And Density. Computer Engineering and Applications 47(19), 147–149 (2011)
MacQueen, J.: Some Methods for Classification And Analysis of Multivariate Observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Science, pp. 281–297 (1967)
Cox, D.R.: Note on Grouping. Journal of the American Statistical Association, 543–547 (1957)
Fisher, W.D.: On Grouping for Maximum Homogeneity. Journal of the American Statistical Association, 789–798 (1958)
Sebestyen, G.S.: Decision Making Process in Pattern Recognition, p. 162. Macmillan, New York (1962)
Cheng, M.Y., Huang, K.Y., Chen, H.M.: K-Means Particle Swarm Optimization with Embedded Chaotic Search for Solving Multidimensional Problems. Applied Mathematics and Computation 219(6), 3091–3099 (2012)
Rajab, M.: Segmentation of Dermatoscopic Image by Frequency Domain Filtering And K-means Clustering Algorithms. Skin Research and Technology 17(4), 469–478 (2011)
Nunes, J., Madeira, M., Gazarini, L., et al.: A Data Mining Approach to Improve Multiple Regression Models of Soil Nitrate Concentration Predictions in Quercus Rotundifolia Montados(Portugal). Agroforestry Systems 84(1), 89–100 (2012)
Selim, S.Z., Ismail, M.A.: K-Means Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence 6(1), 81–87 (1984)
Bagirov, A.M.: Modified Global K-means Algorithm for Minimum Sum-of-squares Clustering Problems. Pattern Recognition 41(10), 3192–3199 (2008)
Lee, W., Lee, S.S., An, D.-U.: Study of a Reasonable Initial Center Selection Method Applied to a K-Means Clustering. IEICE Transactions on Information and Systems 96(8), 1727–1733 (2013)
Khan, F.: An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application. Applied Soft Computing 12(11), 3698–3700 (2012)
Zhang, S., Chu, Y.: High-performance Computing of GPU CUDA (2009)
Ryoo, S., Rodrigues, C.I., et al.: Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82 (2008)
Wu, J., Hong, B.: An Efficient k-means Algorithm on CUDA. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1740–1749 (2011)
Bai, H.T., et al.: K-means on Commodity GPUs with CUDA. In: 2009 WRI World Congress on Computer Science and Information Engineering, pp. 651–655 (2009)
Li, Y., Zhao, K., Chu, X., Liu, J.: Speeding up K-Means Algorithm by GPUs. Journal of Computer and System Sciences 79(2), 216–229 (2013)
Kijsipongse, E.: Dynamic Load Balancing on GPU Clusters for Large-scale K-Means Clustering. In: 2012 International Joint Conference on. Computer Science and Software Engineering (JCSSE), pp. 346–350. IEEE (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Yan, B., Zhang, Y., Yang, Z., Su, H., Zheng, H. (2014). DVT-PKM: An Improved GPU Based Parallel K-Means Algorithm. In: Huang, DS., Jo, KH., Wang, L. (eds) Intelligent Computing Methodologies. ICIC 2014. Lecture Notes in Computer Science(), vol 8589. Springer, Cham. https://doi.org/10.1007/978-3-319-09339-0_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-09339-0_60
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09338-3
Online ISBN: 978-3-319-09339-0
eBook Packages: Computer ScienceComputer Science (R0)