Abstract
The visual assessment of (cluster) tendency (VAT) algorithm is an effective tool for investigating cluster tendency, which produces an intuitive image of matrix as the representation of complex datasets. The improved VAT (iVAT) incorporates a path-based distance metric into VAT to improve its effectiveness on complex-shaped datasets. The efficient formulation of the iVAT algorithm (efiVAT) further reduces the computational complexity of iVAT from \(O(N^3)\) to \(O(N^2)\). In this paper, we propose eVAT, an edge-based algorithm that can replicate the output of efiVAT but is more efficient and more suitable for parallelism. We also propose a parallel scheme to accelerate eVAT using NVIDIA GPU and CUDA architecture. We show that, on a range of datasets, the GPU-based eVAT features good scalability and can achieve significant speedups.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks, pp. 2225–2230 (2002)
Bezdek, J.C., Hathaway, R.J., Huband, J.M.: Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans. Fuzzy Syst. 15(5), 890–903 (2007)
Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Elsevier, Hoboken (2012)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)
CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html
Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: International Workshop on Data Management on New Hardware, Damon 2009, pp. 34–42. Providence June (2009)
Farber, R.: CUDA Application Design and Development. Morgan Kaufmann Publishers Inc., Burlington (2011)
Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data, pp. 325–336. Chicago June (2006)
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 215–226. Paris June (2004)
Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)
Hathaway, R.J., Bezdek, J.C., Huband, J.M.: Scalable visual assessment of cluster tendency for large data sets. Pattern Recognit. 39(7), 1315–1324 (2006)
Havens, T.C., Bezdek, J.C.: An efficient f of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowl. Data Eng. 24(5), 813–822 (2012)
Havens, T.C., Bezdek, J.C., Keller, J.M., Popescu, M.: Clustering in ordered dissimilarity data. Int. J. Intell. Syst. 24(5), 504–528 (2009)
He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, p. 46 (2007)
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 511–524, Vancouver June (2008)
Huband, J.M., Bezdek, J.C., Hathaway, R.J.: Revised visual assessment of (cluster) tendency (reVAT). In: International Conference of the North American Fuzzy Information Processing Society, vol. 1, pp. 101–104 (2004)
Huband, J.M., Bezdek, J.C., Hathaway, R.J.: bigVAT: visual assessment of cluster tendency for large data sets. Pattern Recognit. 38(11), 1875–1886 (2005)
Larsen, E.S., Mcallister, D.: Fast matrix multiplies using graphics hardware. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 43–43 (2001)
Lichman, M. UCI Machine Learning Repository. Irvine, University of California, Irvine, School of Information and Computer Sciences. (2013). http://archive.ics.uci.edu/ml
Meng, T., Yuan, B.: Parallel visual assessment of cluster tendency on GPU. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 429–440. Springer (2017)
OpenCV User Guide. http://docs.opencv.org/2.4.13/doc/user_guide/user_guide.html
Pakhira, M.K.: Finding number of clusters before finding clusters. Procedia Technol. 4(11), 27–37 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(10), 2825–2830 (2013)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (2014)
Sart, D., Mueen, A., Najjar, W., Keogh, E., Niennattrakul, V.: Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In: ICDM 2010, The IEEE International Conference on Data Mining, , pp. 14–17. Sydney December (2010)
Sledge, I.J., Huband, J.M., Bezdek, J.C.: (Automatic) Cluster count extraction from unlabeled data sets. In: 5th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 3–13 (2008)
Wang, L., Geng, X., Bezdek, J., Leckie, C., Kotagiri, R.: SpecVAT: enhanced visual cluster analysis. In: IEEE International Conference on Data Mining, pp. 638–647 (2008)
Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Automatically determining the number of clusters in unlabeled data sets. IEEE Trans. Knowl. Data Eng. 21(3), 335–350 (2009)
Wang, L., Nguyen, U.T., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 16–27. Springer (2010)
Wilt, N.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, London (2013)
Acknowledgements
This work was partially supported by the NVIDIA GPU Education Center awarded to Tsinghua University.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extension Version of the PAKDD’2017 Long Presentation paper “Parallel Visual Assessment of Cluster Tendency on GPU” [20].
Rights and permissions
About this article
Cite this article
Meng, T., Yuan, B. Parallel edge-based visual assessment of cluster tendency on GPU. Int J Data Sci Anal 6, 287–295 (2018). https://doi.org/10.1007/s41060-018-0100-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0100-7