Skip to main content
Log in

An enhanced visual approach for accessing the clustering tendency of big data

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Cluster analysis aims to create the groups for the data objects based on the assessment of similarity features. It is an essential unsupervised technique for the unlabelled datasets. For example, data clustering methods' primary problem is that k-means suffer from the intractable assignment of 'k' value by external interference (or user). Finding the number of clusters 'k' is called a clustering tendency. Existing visual approaches, i.e., visual access tendency (VAT), cosine-based VAT (cVAT), cosine-based spectral VAT(CS-VAT), are suitable for determining the value of cluster tendency of regular data. The Clustering using Improved Visual Assessment of Tendency (ClusiVAT) performs as the best for significant data clustering than other visual approaches. It uses the sampling technique for faster results; however, it perfectly works for Gaussian-based generated datasets. Thus, the proposed work develops the enhanced visual approaches for obtaining the quality of clusters for the typical datasets. Performance of enhanced visual approaches is demonstrated in the experimental study using benchmarked datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  2. Tariq, A., Foroosh, H.: T-clustering: Image clustering by tensor decomposition. In: 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 4803–4807

  3. Rajendra Prasad, K., Suleman Basha, M.: Improving the performance of speech clustering method. In: IEEE—10th International Conference on Intelligent Systems and Control (ISCO) (2016).

  4. Mahmud, M.S., Huang, J.Z., Salloum, S., Emara, T.Z., Sadatdiynov, K.: A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining Anal. 3(2), 85–101 (2020)

    Article  Google Scholar 

  5. Sculley, D.: Web-scale k-means clustering. In: Proc. 19th Int. Conf. World Wide Web, pp. 1177–1178 (2020)

  6. Bezdek, J.C., Hathaway, R.J.: “VAT: a tool for visual assessment of (cluster) tendency”. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02, pp 2225–2230 (2002)

  7. Rajendra Prasad, K., Mohammed, M., Noorullah, R.M.: Visual topic models for healthcare data clustering. Evolutionary Intelligence (2019)

  8. S. Singh, R. Srivastava, V. Kumar and S. Agarwal, "An approximate algorithm for degree constraint minimum spanning tree," 2010 International Conference on Computer and Communication Technology (ICCCT), Allahabad, Uttar Pradesh, 2010, pp. 687–692

  9. Kumar, D., Bezdek, J.C., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.C.: A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10), 2372–2385 (2016)

    Article  Google Scholar 

  10. Kumar, D., Palaniswami, M., Rajasegarar, S., Leckie, C., Bezdek, J.C., Havens, T.C.: clusiVAT: a mixed visual/numerical clustering algorithm for big data. In: 2013 IEEE International Conference on Big Data, Silicon Valley, CA, 2013, pp. 112–117.

  11. Hitendra Sarma, T., Viswanath, P., Eswara Reddy, B.: Single pass kernel k-means clustering method. Sadhan 38(3), 407–419 (2013)

    Article  Google Scholar 

  12. Rousseeuw, P.J., Kaufman, L.: Finding Groups in Data. Wiley, Hoboken (1990)

    MATH  Google Scholar 

  13. L. Fang and O. C. Au, "Subpixel-based down-sampling via Min-Max Directional Error," Proceedings of 2010 IEEE International Symposium on Circuits and Systems, Paris, 2010, pp. 3641–3644.

  14. Upendar Penmetcha, K. Rajendra Prasad, Visual Social Data Clusters for Effective Topics Tendency with Hybrid Machine Learning Techniques, International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277–3878, Volume-8 Issue-5, January 2020

  15. Liang Wang, J.C. Bezdek, C., Leckie, Ramamohanarao, K.: Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans Knowl. Data Eng. 22(10)

  16. Asuncion, A., Newman, D.: UCI machine learning repository. Irvine, CA: University of California, Department of Information and Computer Science, 2007. [Online]. Available: http:// www.ics.uci.edu/~mlearn/MLRepository.html

  17. LeCun, Y., Cortes, C., Burges, C.J.: “The MNIST dataset of handwritten digits,” 1998. [Online]. Available: http://yann.lecun.com/exdb/mnist.lecun.com/exdb/mnist

  18. Suleman Basha, M., Mouleeswaran, S.K., Rajendra Prasad, K.: Cluster Tendency Methods for Visualizing the Data Partitions, International Journal of Innovative Technology & Exploring Engineering (2019).

  19. Ye, H., Yan, S., Bai, X.: Application of switching median filter in two-dimensional Otsu image segmentation. In: International Conference on Network and Information Systems for Computers (ICNISC), Shanghai, China, 2017, pp. 258–261.

  20. Pattanodom, et al.: Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian Conference on Defense Technology.

  21. Amelio, A., Pizzuti, C.: Is normalized mutual information a fair measure for comparing community detection methods?. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 2015 Aug 25 (pp. 1584–1585)

  22. Prasad, K.R., Mohammed, M., Noorullah, R.M.: Hybrid topic cluster models for social healthcare data. Int. J. Adv. Comput. Sci. Appl. 10(11), 490–506 (2019)

    Google Scholar 

  23. Suleman Basha, M., Mouleeswaran, S.K., Rajendra Prasad, K.: Sampling-based visual assessment computing techniques for an efficient social data clustering. J. Supercomput. (2021). https://doi.org/10.1007/s11227-021-03618-6

    Article  Google Scholar 

  24. Ali Seyed Shirkhorshidi, Saeed Aghabozorgi, Teh Ying Wah, “A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data”, PLoS, Vol.10, Issue. 12, 2015, pp:1–20

  25. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference World Wide Web, 2010, pp. 1177–1178.

  26. Rajendra Prasad, K., Eswara Reddy, B., Moulana Mohammed.: An effective Assessment of Cluster Tendency through Sampling based multi-viewpoints visual method. J. Ambient Intell. Hum. Comput. (2021). https://doi.org/https://doi.org/10.1007/s12652-020-02710-8

  27. Bradley, P.S., Fayyad, U.M., Reina, C. et al.: Scaling clustering algorithms to large databases. In Proc. 4th Int. Conf. Knowl. Discovery Data Mining, 1998, pp. 9–15.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veluru Chinnaiah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chinnaiah, V., Yadav, B.V.R. An enhanced visual approach for accessing the clustering tendency of big data. Distrib Parallel Databases 41, 21–36 (2023). https://doi.org/10.1007/s10619-021-07330-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-021-07330-5

Keywords

Navigation