Abstract
The problem of outlier detection has been studied in the context of several domains and has received attention from the database research community. To the best of our knowledge, work up to date focuses exclusively on the problem as follows [10]: “given a single set of observations in some space, find those that deviate so as to arouse suspicion that they were generated by a different mechanism.” However, in several domains, we have more than one set of observations (or, equivalently, as single set with class labels assigned to each observation). For example, in astronomical data, labels may involve types of galaxies (e.g., spiral galaxies with abnormal concentration of elliptical galaxies in their neighborhood; in biodiversity data, labels may involve different population types, e.g., patches of different species populations, food types, diseases, etc). A single observation may look normal both within its own class, as well as within the entire set of observations. However, when examined with respect to other classes, it may still arouse suspicions. In this paper we consider the problem “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels.” This variant has significant practical importance. Many of the existing outlier detection approaches cannot be extended to this case. We present one practical approach for dealing with this problem and demonstrate its performance on real and synthetic datasets.
This material is basedup on work supportedb y the National Science Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107 IIS-0205224 by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) andd o not necessarily reflect the views of the National Science Foundation, DARPA, or other funding parties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proc. SIGMOD (2001)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. VLDB, pp. 487–499 (1994)
Arning, A., Agrawal, R., Raghavan, P.: A linear methodfor deviation detection in large database. In: Proc. KDD, pp. 164–169 (1996)
Barbará, D., Chen, P.: Using the fractal dimension to cluster datasets. In: Proc. KDD, pp. 260–264 (2000)
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley, Chichester (1994)
Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the correlation fractal dimension. In: Proc. VLDB, pp. 299–310 (1995)
Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proc. PODS, pp. 78–86 (1997)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density based local outliers. In: Proc. SIGMOD Conf., pp. 93–104 (2000)
Faloutsos, C., Seeger, B., T̃raina Jr., C., Traina, A.: Spatial join selectivity using power laws. In: Proc. SIGMOD, pp. 177–188 (2000)
Hawkins, D.M.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)
Jagadish, H.V., Koudas, N., Muthukrishnan, S.: Mining deviants in a time series database. In: VLDB, pp. 102–113 (1999)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comp. Surveys 31(3), 264–323 (1999)
Johnson, T., Kwok, I., Ng, R.T.: Fast computation of 2-dimensional depth contours. In: Proc. KDD, pp. 224–228 (1998)
Knorr, E.M., Ng, R. T.: Algorithms for mining distance-based outliers in large datasets. In: Proc. VLDB 1998, pp. 392–403 (1998)
Knorr, E.M., Ng, R.T.: Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE TKDE 8(6), 884–897 (1996)
Knorr, E.M., Ng, R.T.: A unified notion of outliers: Properties and computation. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 219–222. Springer, Heidelberg (1997)
Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB, pp. 211–222 (1999)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: Algorithms and applications. VLDB Journal 8, 237–253 (2000)
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proc. VLDB, pp. 144–155 (1994)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast outlier detection using the local correlation integral. In: Proc. ICDE (2003)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)
Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: Tri-Plots: Scalable tools for multidimensional data mining. In: Proc. KDD, pp. 184–193 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Papadimitriou, S., Faloutsos, C. (2003). Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds) Advances in Spatial and Temporal Databases. SSTD 2003. Lecture Notes in Computer Science, vol 2750. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45072-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-45072-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40535-1
Online ISBN: 978-3-540-45072-6
eBook Packages: Springer Book Archive