Abstract
This paper aims at dealing with the practical shortages of nearest neighbor based data mining techniques, especially, clustering and outlier detection. In particular, when there are data sets with arbitrary shaped clusters and varying density, it is difficult to determine the proper parameters without a priori knowledge. To address this issue, we define a novel conception called natural neighbor, which can better reflect the relationship between the elements in a data set than k-nearest neighbor does, and we present a graph called weighted natural neighborhood graph for clustering and outlier detection. Furthermore, the whole process needs no parameter to deal with different data sets. Simulations on both synthetic data and real world data show the effectiveness of our proposed method.
Similar content being viewed by others
References
Jain, A.K.: Data clustering: 50 years beyond k-means. In: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases—Part I, pp. 3–4 (2008)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying densitybased local outliers. ACM Sigmod Record 29(2), 93–104 (2000)
Wang, K., Zhihui, D., Chen, Y., Li, S.: V3COCA: an effective clustering algorithm for complicated objects and its application in breast cancer research and diagnosis. Simul. Model. Pract. Theory 17(2), 454–470 (2009)
Chai, Y., Du, Z., Chen, Y.: An A stepwise optimization algorithm of clustered streaming media servers. J. Syst. Softw. 82(8), 1344–1361 (2009)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)
Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference Knowledge Discovery and Data Mining (1996)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. ACM Sigmod Record (Stanford Research Inst Memo Stanford University) 28(2), 49–60 (1999)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Chen, Lajiao, Ma, Yan, Liu, Peng, Wei, Jingbo, Jie, Wei, He, Jijun: A review of parallel computing for large-scale remote sensing image mosaicking. Clust. Comput. 18(2), 517–529 (2015)
Knorr, E.M., Ng, R.T.: A unified notion of outliers: properties and computation. In: In Proceedigs of the International Conference on Knowledge Discovery & Data Mining, pp. 219–222 (1997)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM Sigmod Record 29(2), 427–438 (2000)
Zhang, K., Hutter, M., Jin, H.: A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data. Springer, Berlin (2009)
Ha, J., Seok, S., Lee, J.S.: Robust outlier detection using the instability factor. Knowl.-Based Syst. 63(3), 1523 (2014)
Tang, J., Chen, Z., Fu, W.C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-asia Conference on Advances in Knowledge Discovery & Data Mining, pp. 535–548 (2002)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. Lect. Notes Comput. Sci. 3918, 577–593 (2006)
Liu, J., Deng, H.F.: Outlier detection on uncertain data based on local information. Knowl.-Based Syst. 51(1), 60–71 (2013)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. Proc. Vldb Conf. 88(9), 144–155 (1994)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, pp. 226–231. AAAI Press, Menlo Park (1996)
Al-Zoubi, M.B., Al-Dahoud, A., Yahya, A.A.: New outlier detection method based on fuzzy clustering. Wseas Trans. Inf. Sci. Appl. 7(5), 681–690 (2010)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster analysis. J. Am. Stat. Assoc. 90, 773–795 (1990)
Stevens, S.S.: Mathematics, measurement and psychophysics. In: Stevens, S.S. (ed.) Handbook of Experimental Psychology, pp. 1–49. Wiley, New York (1951)
Wang, J., Neskovic, P., Cooper, L.N.: Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recognit. Lett. 28(2), 43–46 (2006)
García, S., et al.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Qian, F., et al.: Mining regional co-location patterns with kNNG. J. Intell. Inf. Syst. 42(3), 485–505 (2013)
Ghosh, Anil K.: On optimum choice of k in nearest neighbor classification. Comput. Stat. Data Anal. 50(11), 3113–3123 (2006)
Ghosh, A.K.: On nearest neighbor classification using adaptive choice of k. J. Comput. Gr. Stat. 16(2), 482–502 (2007)
Domeniconi, C., Peng, J., Gunopulos, D.: Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1281–1285 (2002)
Bhattacharya, G., Ghosh, K., Chowdhury, A.S.: An affinity-based new local distance function and similarity measure for kNN algorithm. Pattern Recognit. Lett. 33(3), 356–363 (2012)
Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. ACM Sigmod Record 29(2), 201–212 (2000)
Yiu, M.L., Mamoulis, N.: Reverse nearest neighbors search in Ad Hoc subspaces. IEEE Trans. Knowl. Data Eng. 19(3), 412–426 (2007)
Wang, S., Chai, S., Qiannan, L.V.: A pruning based continuous RkNN query algorithm for large k. Chin. J. Electron. 21(3), 523–527 (2012)
Brito, M.R., et al.: Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35(1), 33–42 (1997)
Tang, B., He, H.: ENN: extended nearest neighbor method for pattern recognition [research frontier]. IEEE Comput. Intell. Mag. 10(3), 52–60 (2015)
Shivakumara, P., et al.: A novel mutual nearest neighbor based symmetry for text frame classification in video. Pattern Recognit. 44(8), 1671–1683 (2011)
Huang, H, et al.: Towards effective and efficient mining of arbitrary shaped clusters. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE) (2014)
Xuan, J., Luo, X., Zhang, G., Lu, J., Xu, Z.: Uncertainty analysis for the keyword system of web events. IEEE Trans. Syst. Man Cybern. 46(6), 829–842 (2016)
Wei, X., Luo, X., Li, Q., Zhang, J., Xu, Z.: Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE Trans. Fuzzy Syst. 23(1), 72–84 (2015)
UCI Repository of Machine Learning Databases. University of California, Irvine, CA. http://www.ics.uci.edu/mlearn/MLRepository.html/
Acknowledgments
This work was supported by the National Nature Science Foundation of China (No. 61272194 and No. 61073058) and Natural Science Foundation Project of CQ CSTC ( cstc2013jcyjA 40049).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, Q., Feng, J. & Huang, J. Weighted natural neighborhood graph: an adaptive structure for clustering and outlier detection with no neighborhood parameter. Cluster Comput 19, 1385–1397 (2016). https://doi.org/10.1007/s10586-016-0598-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0598-1