Skip to main content

Cross-Outlier Detection

  • Conference paper
Advances in Spatial and Temporal Databases (SSTD 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2750))

Included in the following conference series:

Abstract

The problem of outlier detection has been studied in the context of several domains and has received attention from the database research community. To the best of our knowledge, work up to date focuses exclusively on the problem as follows [10]: “given a single set of observations in some space, find those that deviate so as to arouse suspicion that they were generated by a different mechanism.” However, in several domains, we have more than one set of observations (or, equivalently, as single set with class labels assigned to each observation). For example, in astronomical data, labels may involve types of galaxies (e.g., spiral galaxies with abnormal concentration of elliptical galaxies in their neighborhood; in biodiversity data, labels may involve different population types, e.g., patches of different species populations, food types, diseases, etc). A single observation may look normal both within its own class, as well as within the entire set of observations. However, when examined with respect to other classes, it may still arouse suspicions. In this paper we consider the problem “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels.” This variant has significant practical importance. Many of the existing outlier detection approaches cannot be extended to this case. We present one practical approach for dealing with this problem and demonstrate its performance on real and synthetic datasets.

This material is basedup on work supportedb y the National Science Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107 IIS-0205224 by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) andd o not necessarily reflect the views of the National Science Foundation, DARPA, or other funding parties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Proc. SIGMOD (2001)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. VLDB, pp. 487–499 (1994)

    Google Scholar 

  3. Arning, A., Agrawal, R., Raghavan, P.: A linear methodfor deviation detection in large database. In: Proc. KDD, pp. 164–169 (1996)

    Google Scholar 

  4. Barbará, D., Chen, P.: Using the fractal dimension to cluster datasets. In: Proc. KDD, pp. 260–264 (2000)

    Google Scholar 

  5. Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley, Chichester (1994)

    MATH  Google Scholar 

  6. Belussi, A., Faloutsos, C.: Estimating the selectivity of spatial queries using the correlation fractal dimension. In: Proc. VLDB, pp. 299–310 (1995)

    Google Scholar 

  7. Berchtold, S., Böhm, C., Keim, D.A., Kriegel, H.-P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proc. PODS, pp. 78–86 (1997)

    Google Scholar 

  8. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density based local outliers. In: Proc. SIGMOD Conf., pp. 93–104 (2000)

    Google Scholar 

  9. Faloutsos, C., Seeger, B., T̃raina Jr., C., Traina, A.: Spatial join selectivity using power laws. In: Proc. SIGMOD, pp. 177–188 (2000)

    Google Scholar 

  10. Hawkins, D.M.: Identification of Outliers. Chapman and Hall, Boca Raton (1980)

    MATH  Google Scholar 

  11. Jagadish, H.V., Koudas, N., Muthukrishnan, S.: Mining deviants in a time series database. In: VLDB, pp. 102–113 (1999)

    Google Scholar 

  12. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comp. Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  13. Johnson, T., Kwok, I., Ng, R.T.: Fast computation of 2-dimensional depth contours. In: Proc. KDD, pp. 224–228 (1998)

    Google Scholar 

  14. Knorr, E.M., Ng, R. T.: Algorithms for mining distance-based outliers in large datasets. In: Proc. VLDB 1998, pp. 392–403 (1998)

    Google Scholar 

  15. Knorr, E.M., Ng, R.T.: Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE TKDE 8(6), 884–897 (1996)

    Google Scholar 

  16. Knorr, E.M., Ng, R.T.: A unified notion of outliers: Properties and computation. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 219–222. Springer, Heidelberg (1997)

    Google Scholar 

  17. Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB, pp. 211–222 (1999)

    Google Scholar 

  18. Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: Algorithms and applications. VLDB Journal 8, 237–253 (2000)

    Article  Google Scholar 

  19. Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proc. VLDB, pp. 144–155 (1994)

    Google Scholar 

  20. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: Fast outlier detection using the local correlation integral. In: Proc. ICDE (2003)

    Google Scholar 

  21. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley and Sons, Chichester (1987)

    Book  MATH  Google Scholar 

  22. Traina, A., Traina, C., Papadimitriou, S., Faloutsos, C.: Tri-Plots: Scalable tools for multidimensional data mining. In: Proc. KDD, pp. 184–193 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Papadimitriou, S., Faloutsos, C. (2003). Cross-Outlier Detection. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds) Advances in Spatial and Temporal Databases. SSTD 2003. Lecture Notes in Computer Science, vol 2750. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45072-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45072-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40535-1

  • Online ISBN: 978-3-540-45072-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics