Exploiting domain knowledge to detect outliers

Angiulli, Fabrizio; Fassetti, Fabio

doi:10.1007/s10618-013-0310-5

Exploiting domain knowledge to detect outliers

Published: 05 April 2013

Volume 28, pages 519–568, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Fabrizio Angiulli¹ &
Fabio Fassetti¹

984 Accesses
13 Citations
Explore all metrics

Abstract

We present a novel definition of outlier whose aim is to embed an available domain knowledge in the process of discovering outliers. Specifically, given a background knowledge, encoded by means of a set of first-order rules, and a set of positive and negative examples, our approach aims at singling out the examples showing abnormal behavior. The technique here proposed is unsupervised, since there are no examples of normal or abnormal behavior, even if it has connections with supervised learning, since it is based on induction from examples. We provide a notion of compliance of a set of facts with respect to a background knowledge and a set of examples, which is exploited to detect the examples that prevent to improve generalization of the induced hypothesis. By testing compliance with respect to both the direct and the dual concept, we are able to distinguish among three kinds of abnormalities, that are irregular, anomalous, and outlier observations. This allows us to provide a finer characterization of the anomaly at hand and to single out subtle forms of anomalies. Moreover, we are also able to provide explanations for the abnormality of an observation which make intelligible the motivation underlying its exceptionality. We present both exact and approximate algorithms for mining abnormalities. The approximate algorithms improve execution time while guaranteeing good accuracy. Moreover, we discuss peculiarities of the novel approach, present examples of knowledge mined, analyze the scalability of the algorithms, and provide comparison with noise handling mechanisms and some alternative approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classless Logical Regularities and Outliers Detection

A decomposition of the outlier detection problem into a set of supervised learning problems

Article 20 June 2015

The Concept of α-Outliers in Structured Data Situations

Notes

www.comlab.ox.ac.uk/oucl/research/areas/machlearn/PProgol/pprogol.pl.
http://archive.ics.uci.edu/ml/datasets/Zoo.
http://archive.ics.uci.edu/ml.
Data are available at http://www.comlab.ox.ac.uk/activities/machinelearning/mutagenesis.html.
We employed Intel Xeon E5620 2.40GHz based computer with 4 GB of main memory and the Linux operating system.

References

Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the international conference on management of data (SIGMOD), pp 37–46
Angiulli F, Fassetti F (2009a) Dolphin: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data (TKDD) 3(1):Article 4
Angiulli F, Fassetti F (2009b) Outlier detection using inductive logic programming. In: ICDM, pp 693–698
Angiulli F, Pizzuti C (2002) Fast outlier detection in large high-dimensional data sets. In: Proceedings of the international conference on principles of data mining and knowledge discovery (PKDD), pp 15–26
Angiulli F, Pizzuti C (2005) Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng pp 203–215
Angiulli F, Basta S, Pizzuti C (2006) Distance-based detection and prediction of outliers. IEEE Trans Knowl Data Eng 18(2):145–160
Article Google Scholar
Angiulli F, Greco G, Palopoli L (2007) Outlier detection by logic programming. ACM Trans Comput Log 9(1):Article 7
Angiulli F, Ben-Eliyahu-Zohary R, Palopoli L (2008) Outlier detection using default reasoning. Artif Intell 172(16–17):1837–1872
Article MATH MathSciNet Google Scholar
Bain M, Srinivasan A (1995) Inductive logic programming with large-scale unstructured data. In: Furukawa K, Michie D, Muggleton S (eds) Machine intelligence 14. Clarendon Press, Oxford
Breunig MM, Kriegel H, Ng RT, Sander J (2000) Lof: Identifying density-based local outliers. In: Proceedings of the international conference on management of data (SIGMOD), pp 93–104
Bruno G, Garza P, Quintarelli E, Rosato R (2007) Anomaly detection through quasi-functional dependency analysis. J Digit Inf Manag 5(4):190–200
Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58
Google Scholar
Chawla N, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6
Article Google Scholar
Debnath A, de Compadre RL, Debnath G, Shusterman A, Hansch C (1991) The structure–activity relationship of mutagenic aromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. J Med Chem 34:786–797
Article Google Scholar
Fassetti F, Fazzinga B (2007) Approximate functional dependencies for xml data. In: ADBIS research communications. Springer, Heidelberg, pp 86–95
He Z, Xu X, Huang J, Deng S (2005) Fp-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst 2(1):103–118
Article Google Scholar
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Google Scholar
Kirsten M, Wrobel S, Horváth T (2001) Distance based approaches to relational learning and clustering. In: Dz̆eroski S, Lavrac̆ N (eds) Relational data mining, Springer, Berlin, pp 213–232
Kivinen J, Mannila H (1995) Approximate inference of functional dependencies from relations. TCS 149:129–149
Article MATH MathSciNet Google Scholar
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the international conference on very large data bases (VLDB), pp 392–403
Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: KDD, pp 444–452
Lavrac̆ N, Dz̆eroski S (1994) Inductive logic programming: techniques and applications. Ellis Horwood, Chichester
Lavrac̆ N, Dz̆eroski S, Bratko I (1996) Handling imperfect data in inductive logic programming. In: Raedt LD (ed) Advances in inductive logic programming. IOS Press, Amsterdam, pp 48–64
Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. TKDD 6(1):3
Article Google Scholar
Lloyd JW (1987) Foundations of logic programming. Springer, Berlin
Book MATH Google Scholar
Mannila H, Räihä K (1987) Dependency inference. In: VLDB, pp 155–158
Muggleton S (1995) Inverse entailment and Progol. New Gen Comput 13(3–4):245–286
Article Google Scholar
Muggleton S, Feng C (1990) Efficient induction of logic programs. In: First conference on algorithmic learning theory, pp 368–381
Muggleton S, Bain M, Hayes-Michie J, Michie D (1989) An exeperimental comparison of human and machine learning formalisms. In: Sixth international workshop on machine learning
Novelli N, Cicchetti R (2001) Functional and embedded dependency inference: a data mining point of view. IS 26(7):477–506
MATH Google Scholar
Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings of the international conference on data engineering (ICDE) , pp 315–326
Plotkin G (1971) A further note on inductive generalization. In: Machine learning, vol 6, chap 8. American Elsevier, New York, pp 101–124
Quinlan J, Cameron-Jones R (1993) Foil: a midterm report. In: 6th European conference on machine learning, pp 3–20
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the international conference on management of data (SIGMOD), pp 427–438
Schölkopf B, Burges C, Vapnik V (1995) Extracting support data for a given task. In: KDD, pp 252–257
Srinivasan A, Muggleton S, Sternberg M, King R (1996) Theories for mutagenicity: a study in first-order and feature-based induction. Artif Intell 85(1–2):277–299
Article Google Scholar

Download references

Author information

Authors and Affiliations

DIMES Department, University of Calabria, Via P. Bucci 41C, Rende, Cosenza, Italy
Fabrizio Angiulli & Fabio Fassetti

Authors

Fabrizio Angiulli
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Fassetti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabrizio Angiulli.

Additional information

Responsible editor: Eamonn Keogh.

A preliminary version of this article appears under the title “Outlier Detection using Inductive Logic Programming” in the Proceedings of the IEEE International Conference on Data Mining (ICDM), Miami, Florida, December 6–9, 2009 (Angiulli and Fassetti 2009b).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Angiulli, F., Fassetti, F. Exploiting domain knowledge to detect outliers. Data Min Knowl Disc 28, 519–568 (2014). https://doi.org/10.1007/s10618-013-0310-5

Download citation

Received: 12 September 2012
Accepted: 20 March 2013
Published: 05 April 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10618-013-0310-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting domain knowledge to detect outliers

Abstract

Access this article

Similar content being viewed by others

Classless Logical Regularities and Outliers Detection

A decomposition of the outlier detection problem into a set of supervised learning problems

The Concept of α-Outliers in Structured Data Situations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting domain knowledge to detect outliers

Abstract

Access this article

Similar content being viewed by others

Classless Logical Regularities and Outliers Detection

A decomposition of the outlier detection problem into a set of supervised learning problems

The Concept of α-Outliers in Structured Data Situations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation