Abstract
Outlier detection is an important and attractive problem in knowledge discovery in large data sets. The majority of the recent work in outlier detection follow the framework of Local Outlier Factor (LOF), which is based on the density estimate theory. However, LOF has two disadvantages that restrict its performance in outlier detection. First, the local density estimate of LOF is not accurate enough to detect outliers in the complex and large databases. Second, the performance of LOF depends on the parameter k that determines the scale of the local neighborhood. Our approach adopts the variable kernel density estimate to address the first disadvantage and the weighted neighborhood density estimate to improve the robustness to the variations of the parameter k, while keeping the same framework with LOF. Besides, we propose a novel kernel function named the Volcano kernel, which is more suitable for outlier detection. Experiments on several synthetic and real data sets demonstrate that our approach not only substantially increases the detection performance, but also is relatively scalable in large data sets in comparison to the state-of-the-art outlier detection methods.
This work is supported in part by the NSFC (Grant No. 60825204, 60935002 and 60903147) and the US NSF (Grant No. IIS-0812114 and CCF-1017828).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rousseeuw, P.J., Leroy, A.M.: Robust Rgression and Outlier Detection. John Wiley and Sons, New York (1987)
Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)
Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000)
Papadimitriou, S., Kitagawa, H., Gibbons, P.: Loci: Fast outlier detection using the local correlation integral. In: ICDE, pp. 315–326 (2003)
Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier Detection with Kernel Density Functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007)
Yang, J., Zhong, N., Yao, Y., Wang, J.: Local peculiarity factor and its application in outlier detection. In: KDD, pp. 776–784 (2008)
Tang, J., Chen, Z., Fu, A.W.-c., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002)
Sun, P., Chawla, S.: On local spatial outliers. In: KDD, pp. 209–216 (2004)
Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD, pp. 157–166 (2005)
Abe, N., Zadrozny, B., Langford, J.: Outlier detection by active learning. In: KDD, pp. 504–509 (2006)
Breiman, L.: Bagging predictors. J. Machine Learning 24(2), 123–140 (1996)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 113–139 (1997)
Jin, W., Tung, A., Ha, J.: Mining top-n local outliers in large databases. In: KDD, pp. 293–298 (2001)
Gao, J., Hu, W., Li, W., Zhang, Z.M., Wu, O.: Local Outlier Detection Based on Kernel Regression. In: ICPR, pp. 585–588 (2010)
Barnett, V., Lewis, T.: Outliers in Statistic Data. John Wiley, New York (1994)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. J. Communications of the ACM 18(9), 509–517 (1975)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gao, J., Hu, W., Zhang, Z.(., Zhang, X., Wu, O. (2011). RKOF: Robust Kernel-Based Local Outlier Detection. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-20847-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20846-1
Online ISBN: 978-3-642-20847-8
eBook Packages: Computer ScienceComputer Science (R0)