Skip to main content

Detecting Outliers in Categorical Record Databases Based on Attribute Associations

  • Conference paper
Progress in WWW Research and Development (APWeb 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4976))

Included in the following conference series:

Abstract

Outlier detection, a data mining technique to detect rare events, deviant objects, and exceptions from data, has been drawing increasing attention in recent years. Most existing outlier detection algorithms focus on numerical data sets. We target categorical record databases and detect records in which many attribute values are not observed even though they should occur in association with other attribute values in the records. To detect such records as outliers, we provide an outlier degree, which demonstrates sufficient detection performance in accuracy-evaluation experiments compared with the probabilistic approach used in a related work. We also propose an efficient algorithm for detecting such outlier records. Experiments using real data sets show that our method detects interesting records as outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB, pp. 392–403 (1998)

    Google Scholar 

  2. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD Conference, pp. 427–438 (2000)

    Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: Outlier Detection for High Dimensional Data. In: SIGMOD Conference, pp. 37–46 (2001)

    Google Scholar 

  4. Arning, A., Agrawal, R., Raghavan, P.: A Linear Method for Deviation Detection in Large Databases. In: KDD, pp. 164–169 (1996)

    Google Scholar 

  5. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD Conference, pp. 93–104 (2000)

    Google Scholar 

  6. Jagadish, H.V., Koudas, N., Muthukrishnan, S.: Mining Deviants in a Time Series Database. In: VLDB, pp. 102–113 (1999)

    Google Scholar 

  7. Knorr, E.M., Ng, R.T.: Finding Intentional Knowledge of Distance-Based Outliers. In: VLDB, pp. 211–222 (1999)

    Google Scholar 

  8. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE, p. 315 (2003)

    Google Scholar 

  9. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)

    MATH  Google Scholar 

  10. Zhu, C., Kitagawa, H., Faloutsos, C.: Example-Based Robust Outlier Detection in High Dimensional Datasets. In: ICDM, pp. 829–832 (2005)

    Google Scholar 

  11. Bronstein, A., Das, J., Duro, M., Friedrich, R., Kleyner, G., Mueller, M., Singhal, S., Cohen, I.: Self-aware services: using Bayesian networks for detectinganomalies in Internet-based services. In: International Symposium on Integrated Network Management, pp. 623–638 (2001)

    Google Scholar 

  12. Pelleg, D.: “Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection, ” Doctoral Thesis of Carnegie Mellon University (2004)

    Google Scholar 

  13. Chan, P.K., Mahoney, M.V., Arshad, M.H.: “A Machine Learning Approach to Anomaly Detection,” Technical Report of Florida Institute of Technology (2003)

    Google Scholar 

  14. Das, K., Schneider, J.G.: Detecting anomalous records in categorical datasets. In: KDD, pp. 220–229 (2007)

    Google Scholar 

  15. Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)

    Google Scholar 

  16. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB, pp. 487–499 (1994)

    Google Scholar 

  17. Grahne, G., Zhu, J.: Efficiently Using Prefix-trees in Mining Frequent Itemsets. In: FIMI (2003)

    Google Scholar 

  18. UCI Machine Learning Repository mlearn/MLRepository.html, http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yanchun Zhang Ge Yu Elisa Bertino Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Narita, K., Kitagawa, H. (2008). Detecting Outliers in Categorical Record Databases Based on Attribute Associations. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78849-2_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78848-5

  • Online ISBN: 978-3-540-78849-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics