An extended visual methods to perform data cluster assessment in distributed data systems

Subba Reddy, K.; Rajendra Prasad, K.; Kamatam, Govardhan Reddy; Ramanjaneya Reddy, N.

doi:10.1007/s11227-021-04243-z

An extended visual methods to perform data cluster assessment in distributed data systems

Published: 13 January 2022

Volume 78, pages 8810–8829, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

K. Subba Reddy¹,
K. Rajendra Prasad¹,
Govardhan Reddy Kamatam² &
…
N. Ramanjaneya Reddy³

99 Accesses
3 Citations
Explore all metrics

Abstract

The cluster tendency is one of the major problems in data clustering. Deriving the number of clusters for an unlabeled dataset is known as the cluster tendency problem. In this paper, the preclustering problem for important clustering methods, such as k-means, hierarchical clustering, etc., is considered. Existing preclustering methods, i.e., the visual assessment tendency (VAT), effectively solve the cluster tendency (i.e., k in the k-means). Enhanced methods, such as the improved VAT (iVAT) and other related visual methods, have greatly succeeded in determining the precluster tendency for complex and large datasets. Clustering using the improved visual assessment tendency (ClusiVAT) is a recent visual method and is widely used for large datasets. However, it focuses primarily on the amount of data rather than the dimensionality. Big data in real-time applications possess large sizes and higher dimensions. The ClusiVAT uses the sampling technique to handle the amount of original data; however, it is not focused on high-dimensional big data. Thus, the proposed method develops scalable visual methods using linear subspace learning (LSL) techniques to overcome the curse of dimensionality. Empirical analysis is performed to demonstrate the efficiency of the proposed LSL-based visual methods using benchmarked datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid visual computing models to discover the clusters assessment of high dimensional big data

Article 23 April 2022

An enhanced visual approach for accessing the clustering tendency of big data

Article 15 March 2021

An efficient sampling-based visualization technique for big data clustering with crisp partitions

Article 19 February 2021

References

SulemanBasha M, Mouleeswaran SK, Prasad KR (2021) Sampling-based visual assessment computing techniques for an efficient social data clustering. J Supercomput 77:8013–8037. https://doi.org/10.1007/s11227-021-03618-6
Article Google Scholar
Rui X, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Rathore P, Kumar D, Bezdek JC, Rajasegarar S, Palaniswami M (2019) A rapid hybrid clustering algorithm for large volumes of high dimensional data. IEEE Trans Knowl Data Eng 31(4):641–654. https://doi.org/10.1109/TKDE.2018.2842191
Article Google Scholar
Bezdek JC, Hathaway RJ (2002) VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of 2002 International Joint Conference on Neural Networks, Honolulu, HI, pp 2225–2230
Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE TKDE 24(5):813–822
Google Scholar
Kumar D, Palaniswami M, Rajasegarar S, Leckie C, Bezdek JC, Havens TC (2013) clusiVAT: a mixed visual/numerical clustering algorithm for big data. IEEE Int Conf Big Data 2013:112–117. https://doi.org/10.1109/BigData.2013.6691561
Article Google Scholar
Rathore P, Bezdek JC, Palaniswami M (2021) Fast cluster tendency assessment for big, high-dimensional data. In: Lesot MJ, Marsala C (eds) Fuzzy approaches for soft computing and approximate reasoning: theories and applications. Studies in fuzziness and soft computing, vol 394. Springer, Cham. https://doi.org/10.1007/978-3-030-54341-9_12
Chapter Google Scholar
Ibrahim OA, Keller JM, Bezdek JC (2021) Evaluating evolving structure in streaming data with modified Dunn’s indices. IEEE Trans Emerg Top Comput Intell 5(2):262–273
Article Google Scholar
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385
Article Google Scholar
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the International Conference on Machine Learning, pp 186–193
Urruty T, Djeraba C, Simovici DA (2007) Clustering by random projections. In: Proceedings of the International Conference on Data Mining, pp 107–119
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
MATH Google Scholar
Xiaofei H, Niyogi P (2002) Locality preserving projections (LPP). IEEE Trans Reliab 16:186–197
Google Scholar
Lacaze S, Missoum S (2014) A generalized “max–min” sample for surrogate update. Struct Multidisc Optim 49:683–687. https://doi.org/10.1007/s00158-013-1011-9
Article Google Scholar
Nguyen DT (2012) Clustering with multi-viewpoint based similarity measure. IEEE Trans Knowl Data Eng 24(6):988–1001
Article Google Scholar
Hathaway RJ, Bezdek JC, Huband JM (2006) Scalable visual assessment of cluster tendency for large data sets. Pattern Recogn 39:1315–1324
Article MATH Google Scholar
Bezdek JL (2008) SpecVAT: enhanced visual cluster analysis. In: IEEE International Conference on Data Mining
McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 169–178
Pasupathi S, Shanmuganathan V, Madasamy K et al (2021) Trend analysis using agglomerative hierarchical clustering approach for time series big data. J Supercomput 77:6505–6524. https://doi.org/10.1007/s11227-020-03580-9
Article Google Scholar
Ngueilbaye A, Wang H, Mahamat DA et al (2021) SDLER: stacked dedupe learning for entity resolution in big data era. J Supercomput 77:10959–10983. https://doi.org/10.1007/s11227-021-03710-x
Article Google Scholar
Basha MS, Mouleeswaran SK, Prasad KR (2021) Sampling-based visual assessment computing techniques for an efficient social data clustering. J Supercomput 77:8013–8037. https://doi.org/10.1007/s11227-021-03618-6
Article Google Scholar
Jalalian Z, Sharifi M (2021) A hierarchical multi-objective task scheduling approach for fast big data processing. J Supercomput. https://doi.org/10.1007/s11227-021-03960-9
Article Google Scholar
Kumar D, Jha VK (2021) An efficient query optimization technique in big data using σσ-ANFIS load balancer and CaM-BW optimizer. J Supercomput 77:13018–13045. https://doi.org/10.1007/s11227-021-03793-6
Article Google Scholar
Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T (2014) Big data clustering: a review. In: Proceedings of the International Conference on Computer Science and Application, pp 707–720
Eswara Reddy B, Rajendra Prasad K (2016) Improving the performance of visualized clustering method. Int J Syst Assur Eng Manag 7:102–111. https://doi.org/10.1007/s13198-015-0342-x
Article Google Scholar
SulemanBasha M, Mouleeswaran SK, Rajendra Prasad K (2019) Cluster tendency methods for visualizing the data partitions. Int J Innov Technol Explore Eng 8:2978–2982
Google Scholar
Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. ACM, pp 1177–1178
Bradley PS, Fayyad UM, Reina C, et al (1998) Scaling clustering algorithms to large databases. In: Proceedings of the 4th International Conference on Knowledge Discovery Data Mining, pp 9–15
Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering pattern-based clustering and correlation clustering. ACM Trans Knowl Discov Data 3(1):1–58
Article Google Scholar
Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newslett 6(1):90–105
Article Google Scholar
Asuncion A, Newman D (2007) UCI machine learning repository
Assent I (2012) Clustering high dimensional data. Wiley Interdiscip Rev Data Min Knowl Discov 2(4):340–350
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, Rajeev Gandhi Memorial College of Engineering and Technology, Nandyal, Andhra Pradesh, India
K. Subba Reddy & K. Rajendra Prasad
Department of CSE, G. Pulla Reddy Engineering College, Kurnool, Andhra Pradesh, India
Govardhan Reddy Kamatam
Department of CSE, KSRM College of Engineering, Kadapa, Andhra Pradesh, India
N. Ramanjaneya Reddy

Authors

K. Subba Reddy
View author publications
You can also search for this author in PubMed Google Scholar
K. Rajendra Prasad
View author publications
You can also search for this author in PubMed Google Scholar
Govardhan Reddy Kamatam
View author publications
You can also search for this author in PubMed Google Scholar
N. Ramanjaneya Reddy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Rajendra Prasad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Subba Reddy, K., Rajendra Prasad, K., Kamatam, G.R. et al. An extended visual methods to perform data cluster assessment in distributed data systems. J Supercomput 78, 8810–8829 (2022). https://doi.org/10.1007/s11227-021-04243-z

Download citation

Accepted: 30 November 2021
Published: 13 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11227-021-04243-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extended visual methods to perform data cluster assessment in distributed data systems

Abstract

Access this article

Similar content being viewed by others

Hybrid visual computing models to discover the clusters assessment of high dimensional big data

An enhanced visual approach for accessing the clustering tendency of big data

An efficient sampling-based visualization technique for big data clustering with crisp partitions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An extended visual methods to perform data cluster assessment in distributed data systems

Abstract

Access this article

Similar content being viewed by others

Hybrid visual computing models to discover the clusters assessment of high dimensional big data

An enhanced visual approach for accessing the clustering tendency of big data

An efficient sampling-based visualization technique for big data clustering with crisp partitions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation