Abstract
Determining the number of clusters in a data set is a critical issue in cluster analysis. The Visual Assessment of (cluster) Tendency (VAT) algorithm is an effective tool for investigating cluster tendency, which produces an intuitive image of matrix as the representation of complex data sets. However, VAT can be computationally expensive for large data sets due to its \( O\left( {N^{2} } \right) \) time complexity. In this paper, we propose an efficient parallel scheme to accelerate the original VAT using NVIDIA GPU and CUDA architecture. We show that, on a range of data sets, the GPU-based VAT features good scalability and can achieve significant speedups compared to the original algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Wang, L., Geng, X., Bezdek, J., Leckie, C., Kotagiri, R.: SpecVAT: enhanced visual cluster analysis. In: International Conference on Data Mining, pp. 638–647 (2008)
Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks, vol. 3, pp. 2225–2230 (2002)
Huband, J.M., Bezdek, J.C., Hathaway, R.J.: Revised visual assessment of (cluster) tendency (reVAT). In: International Conference of the North American Fuzzy Information Processing Society, pp. 101–104 (2004)
Huband, J., Bezdek, J.C., Hathaway, R.: bigVAT: visual assessment of cluster tendency for large data sets. Pattern Recogn. 38(11), 1875–1886 (2005)
Hathaway, R., Bezdek, J.C., Huband, J.: Scalable visual assessment of cluster tendency. Pattern Recogn. 39(7), 1315–1324 (2006)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)
Pakhira, M.K.: Finding number of clusters before finding clusters. Procedia Technol. 4, 27–37 (2012)
Wang, L., Nguyen, U.T.V., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6118, pp. 16–27. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13657-3_5
Havens, T.C., Bezdek, J.C.: An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowl. Data Eng. 24(5), 813–822 (2012)
Havens, T.C., Bezdek, J.C., Keller, J.M., Popescu, M.: Clustering in ordered dissimilarity data. Int. J. Intell. Syst. 24(5), 504–528 (2009)
Bezdek, J.C., Hathaway, R., Huband, J.: Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans. Fuzzy Syst. 15(5), 890–903 (2007)
Sledge, I., Huband, J., Bezdek, J.C.: (Automatic) Cluster count extraction from unlabeled datasets. In: Joint International Conference on Natural Computation and International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 3–13 (2008)
CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html
Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes, Oxford (2012)
Farber, R.: CUDA Application Design and Development. Elsevier, Amsterdam (2012)
Larsen, E.S., McAllister, D.: Fast matrix multiplies using graphics hardware. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, no. 43 (2001)
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 215–226 (2004)
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2006)
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524 (2008)
Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)
Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: Proceedings of the Fifth International Workshop on Data Management on New Hardware, pp. 34–42 (2009)
Sart, D., Mueen, A., Najjar, W., Keogh, E., Niennattrakul, V.: Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In: 2010 IEEE International Conference on Data Mining, pp. 1001–1006 (2010)
He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, no. 46 (2007)
Nicholas, W.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Addison-Wesley Professional, Boston (2013)
Pedregosa, et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
OpenCV User Guide. http://docs.opencv.org/2.4.13/doc/user_guide/user_guide.html
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)
Acknowledgment
This work was partially supported by the NVIDIA GPU Education Center awarded to Tsinghua University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Meng, T., Yuan, B. (2017). Parallel Visual Assessment of Cluster Tendency on GPU. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10235. Springer, Cham. https://doi.org/10.1007/978-3-319-57529-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-57529-2_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57528-5
Online ISBN: 978-3-319-57529-2
eBook Packages: Computer ScienceComputer Science (R0)