Abstract
Massive amount of Earth science data open an unprecedented opportunity to discover potentially valuable information. Earth science data are complex, nonlinear, high-dimensional data, and the sparsity of data in high-dimensional space poses major challenge in clustering of the data. Shared nearest neighbor clustering (SNN) algorithm is one of the well-known and efficient methods to handle high-dimensional spatiotemporal data. The SNN clustering method does not cluster all the data forming rigid boundary selection. This paper reports fuzzy shared nearest neighbor (FSNN) algorithm which is an enhancement of the SNN clustering method that has the capability of handling the data lying in the boundary regions by means of a fuzzy concept. The clusters obtained can be characterized by the cluster centroid, which summarizes the behavior of the ocean points in the cluster. The statistical measure is used to find the significant relation between the cluster centroids and the existing climate indices. In this study, correlation measure is used to find the significant pattern, such as teleconnection or dipole. The experimentation is performed on Indian continent latitude range \(7.5^{\circ }{-}37.5^{\circ }\hbox {N}\) and longitude range \(67.5^{\circ }{-}97.5^{\circ }\hbox {E}\). Extensive experiments are carried out to compare the proposed approach with existing clustering methods such as K-means, fuzzy C-means and SNN. The proposed method, FSNN algorithm, not only handles the data lying in the overlapping region, but it also finds more compact and well-separated clusters. FSNN shows better results in terms of finding a significant correlation between cluster centroids and existing climate indices and validated by ground truth dataset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Boriah S, Simon G, Naorem M, Steinbach M, Kumar V, Klooster S, Potter C (2004) Predicting land temperature using ocean data. In: Proceedings of the knowledge discovery in databases KDD
Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–7
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104
Ertoz L, Steinbach M, Kumar V (2002) A new shared nearest neighbor clustering algorithm and its applications. In: Workshop on clustering high dimensional data and its applications at 2nd SIAM international conference on data mining, pp 105–115
Ertöz L, Steinbach M, Kumar V (2003a) Finding topics in collections of documents: a shared nearest neighbor approach. Clust Inf Retr 11:83–103
Ertöz L, Steinbach M, Kumar V (2003b) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 47–58
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, New York
Faghmous JH, Kumar V (2014) A big data guide to understanding climate change: the case for theory-guided data science. Big Data 2(3):155–63
Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–8
Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 100(11):1025–34
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteorol Soc 77(3):437–71
Kawale J, Liess S, Kumar A, Steinbach M, Ganguly AR, Samatova NF, Semazzi FH, Snyder PK, Kumar V (2011) Data guided discovery of dynamic climate dipoles. In: CIDU 2011, pp 30–44
Kawale J, Chatterjee S, Ormsby D, Steinhaeuser K, Liess S, Kumar V (2012) Testing the significance of spatio-temporal teleconnection patterns. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 642–650
Krzanowski WJ, Lai YT (1988) A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics 44:23–34
Kumar V, Steinbach M, Tan PN, Klooster S, Potter C, Torregrosa A (2001) Mining scientific data: discovery of patterns in the global climate system. In: Joint statistical meeting
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2002a) Temporal data mining for the discovery and analysis of ocean climate indices. In: the 2nd workshop on temporal data mining, at the 8th ACM SIGKDD international conference on knowledge discovery and data mining, vol 23, Edmonton, Alberta, Canada
Steinbach M, Tan PN, Kumar V, Potter C, Klooster S, Torregrosa A (2002b) Data mining for the discovery of ocean climate indices. In: Proceedings of the fifth workshop on scientific data mining
Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2003) Discovery of climate indices using clustering. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 446–455
Steinbach M, Tan PN, Boriah S, Kumar V, Klooster S, Potter C (2006) The application of clustering to earth science data: progress and challenges. In: Proceedings of the 2nd NASA data mining workshop
Steinhaeuser K, Chawla NV, Ganguly AR (2011) Comparing predictive power in climate data: clustering matters. International symposium on spatial and temporal databases. Springer, Berlin, pp 39–55
Tan P, Steinbach M, Kumar V, Potter C, Klooster S, Torregrosa A (2001) Finding spatio-temporal patterns in earth science data. In: KDD 2001 workshop on temporal data mining, vol 19
Zhang P, Steinbach M, Kumar V, Shekhar S, Tan P, Klooster S, Potter C (2005) Discovery of patterns of earth science data using data mining. In: Zurada J, Kantardzic M (eds) New generation of data mining applications. IEEE Press
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Sharma, R., Verma, K. Enhanced shared nearest neighbor clustering approach using fuzzy for teleconnection analysis. Soft Comput 22, 8243–8258 (2018). https://doi.org/10.1007/s00500-017-2767-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2767-4