Skip to main content

Advertisement

Enhanced shared nearest neighbor clustering approach using fuzzy for teleconnection analysis

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Massive amount of Earth science data open an unprecedented opportunity to discover potentially valuable information. Earth science data are complex, nonlinear, high-dimensional data, and the sparsity of data in high-dimensional space poses major challenge in clustering of the data. Shared nearest neighbor clustering (SNN) algorithm is one of the well-known and efficient methods to handle high-dimensional spatiotemporal data. The SNN clustering method does not cluster all the data forming rigid boundary selection. This paper reports fuzzy shared nearest neighbor (FSNN) algorithm which is an enhancement of the SNN clustering method that has the capability of handling the data lying in the boundary regions by means of a fuzzy concept. The clusters obtained can be characterized by the cluster centroid, which summarizes the behavior of the ocean points in the cluster. The statistical measure is used to find the significant relation between the cluster centroids and the existing climate indices. In this study, correlation measure is used to find the significant pattern, such as teleconnection or dipole. The experimentation is performed on Indian continent latitude range \(7.5^{\circ }{-}37.5^{\circ }\hbox {N}\) and longitude range \(67.5^{\circ }{-}97.5^{\circ }\hbox {E}\). Extensive experiments are carried out to compare the proposed approach with existing clustering methods such as K-means, fuzzy C-means and SNN. The proposed method, FSNN algorithm, not only handles the data lying in the overlapping region, but it also finds more compact and well-separated clusters. FSNN shows better results in terms of finding a significant correlation between cluster centroids and existing climate indices and validated by ground truth dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  • Boriah S, Simon G, Naorem M, Steinbach M, Kumar V, Klooster S, Potter C (2004) Predicting land temperature using ocean data. In: Proceedings of the knowledge discovery in databases KDD

  • Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3(1):1–27

    Article  MathSciNet  Google Scholar 

  • Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–7

    Article  Google Scholar 

  • Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104

    Article  MathSciNet  Google Scholar 

  • Ertoz L, Steinbach M, Kumar V (2002) A new shared nearest neighbor clustering algorithm and its applications. In: Workshop on clustering high dimensional data and its applications at 2nd SIAM international conference on data mining, pp 105–115

  • Ertöz L, Steinbach M, Kumar V (2003a) Finding topics in collections of documents: a shared nearest neighbor approach. Clust Inf Retr 11:83–103

    Article  MathSciNet  Google Scholar 

  • Ertöz L, Steinbach M, Kumar V (2003b) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, Philadelphia, pp 47–58

    Chapter  Google Scholar 

  • Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, New York

    Book  Google Scholar 

  • Faghmous JH, Kumar V (2014) A big data guide to understanding climate change: the case for theory-guided data science. Big Data 2(3):155–63

    Article  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–8

    MATH  Google Scholar 

  • Jarvis RA, Patrick EA (1973) Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput 100(11):1025–34

    Article  Google Scholar 

  • Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteorol Soc 77(3):437–71

    Article  Google Scholar 

  • Kawale J, Liess S, Kumar A, Steinbach M, Ganguly AR, Samatova NF, Semazzi FH, Snyder PK, Kumar V (2011) Data guided discovery of dynamic climate dipoles. In: CIDU 2011, pp 30–44

  • Kawale J, Chatterjee S, Ormsby D, Steinhaeuser K, Liess S, Kumar V (2012) Testing the significance of spatio-temporal teleconnection patterns. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 642–650

  • Krzanowski WJ, Lai YT (1988) A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics 44:23–34

    Article  MathSciNet  Google Scholar 

  • Kumar V, Steinbach M, Tan PN, Klooster S, Potter C, Torregrosa A (2001) Mining scientific data: discovery of patterns in the global climate system. In: Joint statistical meeting

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  • Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2002a) Temporal data mining for the discovery and analysis of ocean climate indices. In: the 2nd workshop on temporal data mining, at the 8th ACM SIGKDD international conference on knowledge discovery and data mining, vol 23, Edmonton, Alberta, Canada

  • Steinbach M, Tan PN, Kumar V, Potter C, Klooster S, Torregrosa A (2002b) Data mining for the discovery of ocean climate indices. In: Proceedings of the fifth workshop on scientific data mining

  • Steinbach M, Tan PN, Kumar V, Klooster S, Potter C (2003) Discovery of climate indices using clustering. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 446–455

  • Steinbach M, Tan PN, Boriah S, Kumar V, Klooster S, Potter C (2006) The application of clustering to earth science data: progress and challenges. In: Proceedings of the 2nd NASA data mining workshop

  • Steinhaeuser K, Chawla NV, Ganguly AR (2011) Comparing predictive power in climate data: clustering matters. International symposium on spatial and temporal databases. Springer, Berlin, pp 39–55

    Chapter  Google Scholar 

  • Tan P, Steinbach M, Kumar V, Potter C, Klooster S, Torregrosa A (2001) Finding spatio-temporal patterns in earth science data. In: KDD 2001 workshop on temporal data mining, vol 19

  • Zhang P, Steinbach M, Kumar V, Shekhar S, Tan P, Klooster S, Potter C (2005) Discovery of patterns of earth science data using data mining. In: Zurada J, Kantardzic M (eds) New generation of data mining applications. IEEE Press

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rika Sharma.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, R., Verma, K. Enhanced shared nearest neighbor clustering approach using fuzzy for teleconnection analysis. Soft Comput 22, 8243–8258 (2018). https://doi.org/10.1007/s00500-017-2767-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2767-4

Keywords