Abstract
This paper presents an angle and density-based data preprocessing method. It can be used to simultaneously identify outliers and boundary points (called uniformly boundary points). Detecting boundary points is often more interesting than detecting normal points, since they represent valid, interesting, and potentially valuable patterns. An efficient local geometry-based method is proposed for detecting such points by both angle and density measures. The unified measure is adaptive and stable by combining multiple features (angles and density), which can be used to evaluate to what degree a given point is a boundary point. Compared with two related state-of-the-art approaches, our method better reflects the characteristics of the data and provides similar but accuracies for more data set. Experimental results obtained for a number of synthetic and real-world data sets demonstrate the effectiveness and efficiency of our method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: ACM Sigmod Record, vol 30. ACM, pp 37–46
Barnett V, Lewis T (1994) Outliers in statistical data. 3rd edn, Wiley, London
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol 29. ACM, pp 93–104
Ding X, Li Y, Belatreche A Maguire L (2014) A locally adaptive boundary evolution algorithm for novelty detection using level set methods. In: 2014 international joint conference on neural networks (IJCNN). IEEE, pp 1870–1876
Ding X, Li Y, Belatreche A, Maguire LP (2015) Novelty detection using level set methods. IEEE Trans Neural Netw Learn Syst 26(3):576–588
Elhamifar E, Vidal R (2011) Sparse manifold clustering and embedding. In: Advances in neural information processing systems, pp 55–63
Fu L, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinform 8(1):3
Fukunaga K (2013) Introduction to statistical pattern recognition. Academic Press, Cambridge
Grubbs FE (1950) Sample criteria for testing outlying observations. Ann Math Stat 1:27–58
Hautamäki V, Kärkkäinen I, Fränti P (2004) Outlier detection using \(k\)-nearest neighbour graph. In: ICPR, no 3, pp 430–433
Hawkins DM (1980) Identification of outliers. Springer, Berlin
Knox EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the international conference on very large data bases. Citeseer, pp 392–403
Kriegel H-P, Zimek, A et al (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 444–452
Kriegel H-P, Kröger P, Zimek A (2010) Outlier detection techniques. In: Tutorial at the 16th ACM international conference on knowledge discovery and data mining (SIGKDD), Washington
Kutsuna T, Yamamoto A (2014) Outlier detection based on leave-one-out density using binary decision diagrams. In: Tseng VS, Ho TB, Zhou Z-H (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 486–497
Li Y (2008) A surface representation approach for novelty detection. In: International conference on information and automation ICIA 2008, pp 1464–1468
Li Y (2011) Selecting training points for one-class support vector machines. Pattern Recognit Lett 32(11):1517–1522
Li Y, Maguire LP (2011) Selecting critical patterns based on local geometrical and statistical information. IEEE Trans Pattern Anal Mach Intell 33(6):1189–1201
Li L, Lv J, Yi Z (2015a) A non-negative representation learning algorithm for selecting neighbors. Mach Learn 102:133–153
Li X, Lv JC, Cheng D (2015b) Angle-based outlier detection algorithm with more stable relationships. In: Proceedings of the 18th Asia Pacific symposium on intelligent and evolutionary systems, Vol 1. Springer. pp 433–446
Li X, Geng P, Qiu B (2016a) A cluster boundary detection algorithm based on shadowed set. Intell Data Anal 20(1):29–45
Li X, Lv J, Li L, Ao F (2016b) An angle and density-based method for key points detection. In: 2016 international joint conference on neural networks (IJCNN). IEEE
Li X, Lv J, Yi Z (2016c) An efficient representation-based method for boundary point and outlier detection. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2614896
Lv JC, Yi Z, Tan KK (2007) Determination of the number of principal directions in a biologically plausible pca model. IEEE Trans Neural Netw 18(3):910–916
Lv JC, Tan KK, Yi Z, Huang S (2010) A family of fuzzy learning algorithms for robust principal component analysis neural networks. IEEE Trans Fuzzy Syst 18(1):217–226
Lv JC, Yi Z, Zhou J (2011) Subspace learning of neural networks. CRC Press, CRC, Boca Raton
Lv JC, Yi Z, Li Y (2015) Non-divergence of stochastic discrete time algorithms for pca neural networks. IEEE Trans Neural Netw Learn Syst 26(2):394–399
Qiu B, Cao X (2016) Clustering boundary detection for high dimensional space based on space inversion and hopkins statistics. Knowl Based Syst 98:216–225
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Tang K, Peng F, Chen G, Yao X (2014) Population-based algorithm portfolios with automated constituent algorithms selection. Inf Sci 279:94–104
Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y, (eds) (2014) 18th Pacific-Asia conference advances in knowledge discovery and data mining (PAKDD), vol 8444 of Lecture notes in computer science. Springer, Berlin
Wang C, Liu D, Wei QL, Zhao DB, Xia ZC (2014) Iterative adaptive dynamic programming approach to power optimal control for smart grid with energy storage devices. Zidonghua Xuebao/Acta Autom Sin 40(9):1984–1990
Wang H, Jin Y, Yao X (2016) Diversity assessment in many-objective optimization. Trans Cybern 40(6):1510–1522
Waugh SG (1995) Extending and benchmarking Cascade-Correlation: extensions to the Cascade-Correlation architecture and benchmarking of feed-forward supervised artificial neural networks. Ph.D. thesis, University of Tasmania
Xia C, Hsu W, Lee ML, Ooi BC (2006) Border: efficient computation of boundary points. IEEE Trans Knowl Data Eng 18(3):289–303. doi:10.1109/TKDE.2006.38 ISSN 1041-4347
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61375065, 61502208 and 61602066) and by the Project Supported by the Scientific Research Foundation of the Education Department of Sichuan Province(17ZA0063) and the Scientific Research Foundation (KYTZ201608) of CUIT, partially supported by the National Science Fund for Distinguished Young Scholars of China (Grant No. 61625204), the Sichuan Science and Technology Support Project (Grant No. 2014SZ0104), and The Natural Science Foundation of Jiangsu Province of China (Grant No. BK20150522).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by A. Di Nola.
Rights and permissions
About this article
Cite this article
Li, X., Wu, X., Lv, J. et al. Automatic detection of boundary points based on local geometrical measures. Soft Comput 22, 3663–3674 (2018). https://doi.org/10.1007/s00500-017-2817-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2817-y