Skip to main content

Correlation-Based Detection of Attribute Outliers

  • Conference paper
Book cover Advances in Databases: Concepts, Systems and Applications (DASFAA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

Abstract

An outlier is an object that does not conform to the normal behavior of the data set. In data cleaning, outliers are identified for data noise reduction. In applications such as fraud detection, and stock market analysis, outliers suggest abnormal behavior requiring further investigation. Existing outlier detection methods have focused on class outliers and research on attribute outliers is limited, despite the equal role attribute outliers play in depreciating data quality and reducing data mining accuracy. In this paper, we propose a novel method to detect attribute outliers from the deviating correlation behavior of attributes. We formulate three metrics to evaluate outlier-ness of attributes, and introduce an adaptive factor to distinguish outliers from non-outliers. Experiments with both synthetic and real-world data sets indicate that the proposed method is effective in detecting attribute outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhu, X., Wu, X.: Class Noise vs. Attribute Noise: A Quantitative Study of their Impacts. Artificial Intelligence Review 22(3), 177–210 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  2. Apweiler, R., Bairoch, A., Wu, C.H., et al.: UniProt: the Universal Protein Knowledgebase. Nucleic Acids Res. 32, 115–119 (2004)

    Article  Google Scholar 

  3. Barnett, V.: Outliers in Statistical Data. John Wiley and Sons, New York (1994)

    MATH  Google Scholar 

  4. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley, Chichester (1987)

    MATH  Google Scholar 

  5. Choh, M.T.: Polishing Blemishes: Issues in Data Correction. IEEE Intelligent Systems 19(2), 34–39 (2004)

    Article  Google Scholar 

  6. He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recognition Letters 24(9-10), 1641–1650 (2003)

    Article  MATH  Google Scholar 

  7. Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recognition Letters 22(6-7), 691–700 (2001)

    Article  MATH  Google Scholar 

  8. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying Density-based Local Outliers. In: ACM SIGMOD, pp. 93–104 (2000)

    Google Scholar 

  9. Jin, W., Tung, A.K.H., Han, J.: Mining Top-n Local Outliers in Large Databases. In: SIGKDD, pp. 293–298 (2001)

    Google Scholar 

  10. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection using the Local Correlation Integral. In: IEEE ICDE, pp. 315–326 (2003)

    Google Scholar 

  11. Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based Outliers: Algorithms and Applications. VLDB Journal 8, 237–253 (2000)

    Article  Google Scholar 

  12. Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient Algorithm for Mining Outliers from Large Data Sets. In: ACM SIGMOD, pp. 427–438 (2000)

    Google Scholar 

  13. Ren, D., Rahal, I., Perrizo, W., Scott, K.: A vertical distance-based outlier detection method with local pruning. In: ACM CIKM, pp. 279–284 (2004)

    Google Scholar 

  14. Gilks, W.R., Audit, B., De Angelis, D., et al.: Modeling the percolation of annotation errors in a data-base of protein sequences. Bioinformatics 18(12), 1641–1649 (2002)

    Article  Google Scholar 

  15. Wieser, D., Kretschmann, E., Apweiler, R.: Filtering erroneous protein annotation. Bioinformatics 20, i342–i347 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Koh, J.L.Y., Lee, M.L., Hsu, W., Lam, K.T. (2007). Correlation-Based Detection of Attribute Outliers. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71703-4_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71702-7

  • Online ISBN: 978-3-540-71703-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics