Abstract
An outlier is an object that does not conform to the normal behavior of the data set. In data cleaning, outliers are identified for data noise reduction. In applications such as fraud detection, and stock market analysis, outliers suggest abnormal behavior requiring further investigation. Existing outlier detection methods have focused on class outliers and research on attribute outliers is limited, despite the equal role attribute outliers play in depreciating data quality and reducing data mining accuracy. In this paper, we propose a novel method to detect attribute outliers from the deviating correlation behavior of attributes. We formulate three metrics to evaluate outlier-ness of attributes, and introduce an adaptive factor to distinguish outliers from non-outliers. Experiments with both synthetic and real-world data sets indicate that the proposed method is effective in detecting attribute outliers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhu, X., Wu, X.: Class Noise vs. Attribute Noise: A Quantitative Study of their Impacts. Artificial Intelligence Review 22(3), 177–210 (2004)
Apweiler, R., Bairoch, A., Wu, C.H., et al.: UniProt: the Universal Protein Knowledgebase. Nucleic Acids Res. 32, 115–119 (2004)
Barnett, V.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley, Chichester (1987)
Choh, M.T.: Polishing Blemishes: Issues in Data Correction. IEEE Intelligent Systems 19(2), 34–39 (2004)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognition Letters 24(9-10), 1641–1650 (2003)
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recognition Letters 22(6-7), 691–700 (2001)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying Density-based Local Outliers. In: ACM SIGMOD, pp. 93–104 (2000)
Jin, W., Tung, A.K.H., Han, J.: Mining Top-n Local Outliers in Large Databases. In: SIGKDD, pp. 293–298 (2001)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection using the Local Correlation Integral. In: IEEE ICDE, pp. 315–326 (2003)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based Outliers: Algorithms and Applications. VLDB Journal 8, 237–253 (2000)
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithm for Mining Outliers from Large Data Sets. In: ACM SIGMOD, pp. 427–438 (2000)
Ren, D., Rahal, I., Perrizo, W., Scott, K.: A vertical distance-based outlier detection method with local pruning. In: ACM CIKM, pp. 279–284 (2004)
Gilks, W.R., Audit, B., De Angelis, D., et al.: Modeling the percolation of annotation errors in a data-base of protein sequences. Bioinformatics 18(12), 1641–1649 (2002)
Wieser, D., Kretschmann, E., Apweiler, R.: Filtering erroneous protein annotation. Bioinformatics 20, i342–i347 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koh, J.L.Y., Lee, M.L., Hsu, W., Lam, K.T. (2007). Correlation-Based Detection of Attribute Outliers. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-71703-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71702-7
Online ISBN: 978-3-540-71703-4
eBook Packages: Computer ScienceComputer Science (R0)