Correlation-Based Detection of Attribute Outliers

Koh, Judice L. Y.; Lee, Mong Li; Hsu, Wynne; Lam, Kai Tak

doi:10.1007/978-3-540-71703-4_16

Judice L. Y. Koh^1,2,
Mong Li Lee²,
Wynne Hsu² &
…
Kai Tak Lam³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1542 Accesses
8 Citations

Abstract

An outlier is an object that does not conform to the normal behavior of the data set. In data cleaning, outliers are identified for data noise reduction. In applications such as fraud detection, and stock market analysis, outliers suggest abnormal behavior requiring further investigation. Existing outlier detection methods have focused on class outliers and research on attribute outliers is limited, despite the equal role attribute outliers play in depreciating data quality and reducing data mining accuracy. In this paper, we propose a novel method to detect attribute outliers from the deviating correlation behavior of attributes. We formulate three metrics to evaluate outlier-ness of attributes, and introduce an adaptive factor to distinguish outliers from non-outliers. Experiments with both synthetic and real-world data sets indicate that the proposed method is effective in detecting attribute outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhu, X., Wu, X.: Class Noise vs. Attribute Noise: A Quantitative Study of their Impacts. Artificial Intelligence Review 22(3), 177–210 (2004)
Article MATH MathSciNet Google Scholar
Apweiler, R., Bairoch, A., Wu, C.H., et al.: UniProt: the Universal Protein Knowledgebase. Nucleic Acids Res. 32, 115–119 (2004)
Article Google Scholar
Barnett, V.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)
MATH Google Scholar
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley, Chichester (1987)
MATH Google Scholar
Choh, M.T.: Polishing Blemishes: Issues in Data Correction. IEEE Intelligent Systems 19(2), 34–39 (2004)
Article Google Scholar
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognition Letters 24(9-10), 1641–1650 (2003)
Article MATH Google Scholar
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recognition Letters 22(6-7), 691–700 (2001)
Article MATH Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying Density-based Local Outliers. In: ACM SIGMOD, pp. 93–104 (2000)
Google Scholar
Jin, W., Tung, A.K.H., Han, J.: Mining Top-n Local Outliers in Large Databases. In: SIGKDD, pp. 293–298 (2001)
Google Scholar
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection using the Local Correlation Integral. In: IEEE ICDE, pp. 315–326 (2003)
Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based Outliers: Algorithms and Applications. VLDB Journal 8, 237–253 (2000)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithm for Mining Outliers from Large Data Sets. In: ACM SIGMOD, pp. 427–438 (2000)
Google Scholar
Ren, D., Rahal, I., Perrizo, W., Scott, K.: A vertical distance-based outlier detection method with local pruning. In: ACM CIKM, pp. 279–284 (2004)
Google Scholar
Gilks, W.R., Audit, B., De Angelis, D., et al.: Modeling the percolation of annotation errors in a data-base of protein sequences. Bioinformatics 18(12), 1641–1649 (2002)
Article Google Scholar
Wieser, D., Kretschmann, E., Apweiler, R.: Filtering erroneous protein annotation. Bioinformatics 20, i342–i347 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 119613, Singapore
Judice L. Y. Koh
School of Computing, National University of Singapore,
Judice L. Y. Koh, Mong Li Lee & Wynne Hsu
Institute of High Performance Computing, 117528, Singapore
Kai Tak Lam

Authors

Judice L. Y. Koh
View author publications
You can also search for this author in PubMed Google Scholar
Mong Li Lee
View author publications
You can also search for this author in PubMed Google Scholar
Wynne Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Tak Lam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koh, J.L.Y., Lee, M.L., Hsu, W., Lam, K.T. (2007). Correlation-Based Detection of Attribute Outliers. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-71703-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71702-7
Online ISBN: 978-3-540-71703-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics