Abstract
Cluster analysis aims to create the groups for the data objects based on the assessment of similarity features. It is an essential unsupervised technique for the unlabelled datasets. For example, data clustering methods' primary problem is that k-means suffer from the intractable assignment of 'k' value by external interference (or user). Finding the number of clusters 'k' is called a clustering tendency. Existing visual approaches, i.e., visual access tendency (VAT), cosine-based VAT (cVAT), cosine-based spectral VAT(CS-VAT), are suitable for determining the value of cluster tendency of regular data. The Clustering using Improved Visual Assessment of Tendency (ClusiVAT) performs as the best for significant data clustering than other visual approaches. It uses the sampling technique for faster results; however, it perfectly works for Gaussian-based generated datasets. Thus, the proposed work develops the enhanced visual approaches for obtaining the quality of clusters for the typical datasets. Performance of enhanced visual approaches is demonstrated in the experimental study using benchmarked datasets.
Similar content being viewed by others
References
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Tariq, A., Foroosh, H.: T-clustering: Image clustering by tensor decomposition. In: 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 4803–4807
Rajendra Prasad, K., Suleman Basha, M.: Improving the performance of speech clustering method. In: IEEE—10th International Conference on Intelligent Systems and Control (ISCO) (2016).
Mahmud, M.S., Huang, J.Z., Salloum, S., Emara, T.Z., Sadatdiynov, K.: A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining Anal. 3(2), 85–101 (2020)
Sculley, D.: Web-scale k-means clustering. In: Proc. 19th Int. Conf. World Wide Web, pp. 1177–1178 (2020)
Bezdek, J.C., Hathaway, R.J.: “VAT: a tool for visual assessment of (cluster) tendency”. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02, pp 2225–2230 (2002)
Rajendra Prasad, K., Mohammed, M., Noorullah, R.M.: Visual topic models for healthcare data clustering. Evolutionary Intelligence (2019)
S. Singh, R. Srivastava, V. Kumar and S. Agarwal, "An approximate algorithm for degree constraint minimum spanning tree," 2010 International Conference on Computer and Communication Technology (ICCCT), Allahabad, Uttar Pradesh, 2010, pp. 687–692
Kumar, D., Bezdek, J.C., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.C.: A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10), 2372–2385 (2016)
Kumar, D., Palaniswami, M., Rajasegarar, S., Leckie, C., Bezdek, J.C., Havens, T.C.: clusiVAT: a mixed visual/numerical clustering algorithm for big data. In: 2013 IEEE International Conference on Big Data, Silicon Valley, CA, 2013, pp. 112–117.
Hitendra Sarma, T., Viswanath, P., Eswara Reddy, B.: Single pass kernel k-means clustering method. Sadhan 38(3), 407–419 (2013)
Rousseeuw, P.J., Kaufman, L.: Finding Groups in Data. Wiley, Hoboken (1990)
L. Fang and O. C. Au, "Subpixel-based down-sampling via Min-Max Directional Error," Proceedings of 2010 IEEE International Symposium on Circuits and Systems, Paris, 2010, pp. 3641–3644.
Upendar Penmetcha, K. Rajendra Prasad, Visual Social Data Clusters for Effective Topics Tendency with Hybrid Machine Learning Techniques, International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277–3878, Volume-8 Issue-5, January 2020
Liang Wang, J.C. Bezdek, C., Leckie, Ramamohanarao, K.: Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans Knowl. Data Eng. 22(10)
Asuncion, A., Newman, D.: UCI machine learning repository. Irvine, CA: University of California, Department of Information and Computer Science, 2007. [Online]. Available: http:// www.ics.uci.edu/~mlearn/MLRepository.html
LeCun, Y., Cortes, C., Burges, C.J.: “The MNIST dataset of handwritten digits,” 1998. [Online]. Available: http://yann.lecun.com/exdb/mnist.lecun.com/exdb/mnist
Suleman Basha, M., Mouleeswaran, S.K., Rajendra Prasad, K.: Cluster Tendency Methods for Visualizing the Data Partitions, International Journal of Innovative Technology & Exploring Engineering (2019).
Ye, H., Yan, S., Bai, X.: Application of switching median filter in two-dimensional Otsu image segmentation. In: International Conference on Network and Information Systems for Computers (ICNISC), Shanghai, China, 2017, pp. 258–261.
Pattanodom, et al.: Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian Conference on Defense Technology.
Amelio, A., Pizzuti, C.: Is normalized mutual information a fair measure for comparing community detection methods?. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 2015 Aug 25 (pp. 1584–1585)
Prasad, K.R., Mohammed, M., Noorullah, R.M.: Hybrid topic cluster models for social healthcare data. Int. J. Adv. Comput. Sci. Appl. 10(11), 490–506 (2019)
Suleman Basha, M., Mouleeswaran, S.K., Rajendra Prasad, K.: Sampling-based visual assessment computing techniques for an efficient social data clustering. J. Supercomput. (2021). https://doi.org/10.1007/s11227-021-03618-6
Ali Seyed Shirkhorshidi, Saeed Aghabozorgi, Teh Ying Wah, “A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data”, PLoS, Vol.10, Issue. 12, 2015, pp:1–20
Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference World Wide Web, 2010, pp. 1177–1178.
Rajendra Prasad, K., Eswara Reddy, B., Moulana Mohammed.: An effective Assessment of Cluster Tendency through Sampling based multi-viewpoints visual method. J. Ambient Intell. Hum. Comput. (2021). https://doi.org/https://doi.org/10.1007/s12652-020-02710-8
Bradley, P.S., Fayyad, U.M., Reina, C. et al.: Scaling clustering algorithms to large databases. In Proc. 4th Int. Conf. Knowl. Discovery Data Mining, 1998, pp. 9–15.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chinnaiah, V., Yadav, B.V.R. An enhanced visual approach for accessing the clustering tendency of big data. Distrib Parallel Databases 41, 21–36 (2023). https://doi.org/10.1007/s10619-021-07330-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-021-07330-5