Cross-Outlier Detection

Papadimitriou, Spiros; Faloutsos, Christos

doi:10.1007/978-3-540-45072-6_12

Spiros Papadimitriou⁸ &
Christos Faloutsos⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2750))

Included in the following conference series:

International Symposium on Spatial and Temporal Databases

796 Accesses
20 Citations

Abstract

The problem of outlier detection has been studied in the context of several domains and has received attention from the database research community. To the best of our knowledge, work up to date focuses exclusively on the problem as follows [10]: “given a single set of observations in some space, find those that deviate so as to arouse suspicion that they were generated by a different mechanism.” However, in several domains, we have more than one set of observations (or, equivalently, as single set with class labels assigned to each observation). For example, in astronomical data, labels may involve types of galaxies (e.g., spiral galaxies with abnormal concentration of elliptical galaxies in their neighborhood; in biodiversity data, labels may involve different population types, e.g., patches of different species populations, food types, diseases, etc). A single observation may look normal both within its own class, as well as within the entire set of observations. However, when examined with respect to other classes, it may still arouse suspicions. In this paper we consider the problem “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels.” This variant has significant practical importance. Many of the existing outlier detection approaches cannot be extended to this case. We present one practical approach for dealing with this problem and demonstrate its performance on real and synthetic datasets.

This material is basedup on work supportedb y the National Science Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107 IIS-0205224 by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) andd o not necessarily reflect the views of the National Science Foundation, DARPA, or other funding parties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proc. SIGMOD (2001)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. VLDB, pp. 487–499 (1994)
Google Scholar
Arning, A., Agrawal, R., Raghavan, P.: A linear methodfor deviation detection in large database. In: Proc. KDD, pp. 164–169 (1996)
Google Scholar
Barbará, D., Chen, P.: Using the fractal dimension to cluster datasets. In: Proc. KDD, pp. 260–264 (2000)
Google Scholar
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley, Chichester (1994)
MATH Google Scholar
Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the correlation fractal dimension. In: Proc. VLDB, pp. 299–310 (1995)
Google Scholar
Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proc. PODS, pp. 78–86 (1997)
Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density based local outliers. In: Proc. SIGMOD Conf., pp. 93–104 (2000)
Google Scholar
Faloutsos, C., Seeger, B., T̃raina Jr., C., Traina, A.: Spatial join selectivity using power laws. In: Proc. SIGMOD, pp. 177–188 (2000)
Google Scholar
Hawkins, D.M.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)
MATH Google Scholar
Jagadish, H.V., Koudas, N., Muthukrishnan, S.: Mining deviants in a time series database. In: VLDB, pp. 102–113 (1999)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comp. Surveys 31(3), 264–323 (1999)
Article Google Scholar
Johnson, T., Kwok, I., Ng, R.T.: Fast computation of 2-dimensional depth contours. In: Proc. KDD, pp. 224–228 (1998)
Google Scholar
Knorr, E.M., Ng, R. T.: Algorithms for mining distance-based outliers in large datasets. In: Proc. VLDB 1998, pp. 392–403 (1998)
Google Scholar
Knorr, E.M., Ng, R.T.: Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE TKDE 8(6), 884–897 (1996)
Google Scholar
Knorr, E.M., Ng, R.T.: A unified notion of outliers: Properties and computation. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 219–222. Springer, Heidelberg (1997)
Google Scholar
Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB, pp. 211–222 (1999)
Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: Algorithms and applications. VLDB Journal 8, 237–253 (2000)
Article Google Scholar
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proc. VLDB, pp. 144–155 (1994)
Google Scholar
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast outlier detection using the local correlation integral. In: Proc. ICDE (2003)
Google Scholar
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)
Book MATH Google Scholar
Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: Tri-Plots: Scalable tools for multidimensional data mining. In: Proc. KDD, pp. 184–193 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA
Spiros Papadimitriou & Christos Faloutsos

Authors

Spiros Papadimitriou
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Academic Computer Technology Institute, Patras, Greece
Thanasis Hadzilacos
Data Engineering Research Lab. Department of Informatics,, Aristotle University, 54124, Thessaloniki, Greece
Yannis Manolopoulos
Flinders University, Adelaide, Australia
John Roddick
Department of Informatics, University of Piraeus, Greece
Yannis Theodoridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papadimitriou, S., Faloutsos, C. (2003). Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds) Advances in Spatial and Temporal Databases. SSTD 2003. Lecture Notes in Computer Science, vol 2750. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45072-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-45072-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40535-1
Online ISBN: 978-3-540-45072-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics