Detecting Outliers in Categorical Record Databases Based on Attribute Associations

Narita, Kazuyo; Kitagawa, Hiroyuki

doi:10.1007/978-3-540-78849-2_13

Kazuyo Narita¹ &
Hiroyuki Kitagawa²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4976))

Included in the following conference series:

Asia-Pacific Web Conference

939 Accesses
5 Citations

Abstract

Outlier detection, a data mining technique to detect rare events, deviant objects, and exceptions from data, has been drawing increasing attention in recent years. Most existing outlier detection algorithms focus on numerical data sets. We target categorical record databases and detect records in which many attribute values are not observed even though they should occur in association with other attribute values in the records. To detect such records as outliers, we provide an outlier degree, which demonstrates sufficient detection performance in accuracy-evaluation experiments compared with the probabilistic approach used in a related work. We also propose an efficient algorithm for detecting such outlier records. Experiments using real data sets show that our method detects interesting records as outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB, pp. 392–403 (1998)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD Conference, pp. 427–438 (2000)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Outlier Detection for High Dimensional Data. In: SIGMOD Conference, pp. 37–46 (2001)
Google Scholar
Arning, A., Agrawal, R., Raghavan, P.: A Linear Method for Deviation Detection in Large Databases. In: KDD, pp. 164–169 (1996)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: SIGMOD Conference, pp. 93–104 (2000)
Google Scholar
Jagadish, H.V., Koudas, N., Muthukrishnan, S.: Mining Deviants in a Time Series Database. In: VLDB, pp. 102–113 (1999)
Google Scholar
Knorr, E.M., Ng, R.T.: Finding Intentional Knowledge of Distance-Based Outliers. In: VLDB, pp. 211–222 (1999)
Google Scholar
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In: ICDE, p. 315 (2003)
Google Scholar
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)
MATH Google Scholar
Zhu, C., Kitagawa, H., Faloutsos, C.: Example-Based Robust Outlier Detection in High Dimensional Datasets. In: ICDM, pp. 829–832 (2005)
Google Scholar
Bronstein, A., Das, J., Duro, M., Friedrich, R., Kleyner, G., Mueller, M., Singhal, S., Cohen, I.: Self-aware services: using Bayesian networks for detectinganomalies in Internet-based services. In: International Symposium on Integrated Network Management, pp. 623–638 (2001)
Google Scholar
Pelleg, D.: “Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection, ” Doctoral Thesis of Carnegie Mellon University (2004)
Google Scholar
Chan, P.K., Mahoney, M.V., Arshad, M.H.: “A Machine Learning Approach to Anomaly Detection,” Technical Report of Florida Institute of Technology (2003)
Google Scholar
Das, K., Schneider, J.G.: Detecting anomalous records in categorical datasets. In: KDD, pp. 220–229 (2007)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB, pp. 487–499 (1994)
Google Scholar
Grahne, G., Zhu, J.: Efficiently Using Prefix-trees in Mining Frequent Itemsets. In: FIMI (2003)
Google Scholar
UCI Machine Learning Repository mlearn/MLRepository.html, http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, Japan
Kazuyo Narita
Graduate School of Systems and Information Engineering Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, Japan
Hiroyuki Kitagawa

Authors

Kazuyo Narita
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Kitagawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yanchun Zhang Ge Yu Elisa Bertino Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Narita, K., Kitagawa, H. (2008). Detecting Outliers in Categorical Record Databases Based on Attribute Associations. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-78849-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78848-5
Online ISBN: 978-3-540-78849-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics