Skip to main content

Outlier Detection with Arbitrary Probability Functions

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8249))

Abstract

We consider the problem of unsupervised outlier detection in large collections of data objects when objects are modeled by means of arbitrary multidimensional probability density functions. Specifically, we present a novel definition of outlier in the context of uncertain data under the attribute level uncertainty model, according to which an uncertain object is an object that always exists but its actual value is modeled by a multivariate pdf. The notion of outlier provided is distance-based, in that an uncertain object is declared to be an outlier on the basis of the expected number of its neighbors in the data set. To the best of our knowledge this is the first work that considers the unsupervised outlier detection problem on the full feature space on data objects modeled by means of arbitrarily shaped multidimensional distribution functions. Properties that allow to reduce the number of probability distance computations are presented, together with an efficient algorithm for determining the outliers in an input uncertain data set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lindley, D.: Understanding Uncertainty. Wiley-Interscience (2006)

    Google Scholar 

  2. Aggarwal, C., Yu, P.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)

    Article  Google Scholar 

  3. Mohri, M.: Learning from uncertain data. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 656–670. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Ngai, W., Kao, B., Chui, C., Cheng, R., Chau, M., Yip, K.: Efficient clustering of uncertain data. In: Proc. Int. Conf. on Data Mining (ICDM), pp. 436–445 (2006)

    Google Scholar 

  5. Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 672–677 (2005)

    Google Scholar 

  6. Ren, J., Lee, S., Chen, X., Kao, B., Cheng, R., Cheung, D.: Naive bayes classification of uncertain data. In: Proc. Int. Conf. on Data Mining (ICDM), pp. 944–949 (2009)

    Google Scholar 

  7. Bi, J., Zhang, T.: Support vector classification with input data uncertainty. In: Proc. Conf. on Neural Information Processing Systems (NIPS), pp. 161–168 (2004)

    Google Scholar 

  8. Aggarwal, C., Yu, P.: Outlier detection with uncertain data. In: Proc. Int. Conf. on Data Mining (SDM), pp. 483–493 (2008)

    Google Scholar 

  9. Green, T., Tannen, V.: Models for incomplete and probabilistic information. IEEE Data Eng. Bull. 29(1), 17–24 (2006)

    Google Scholar 

  10. Hawkins, D.: Identification of Outliers. Monographs on Applied Probability and Statistics. Chapman & Hall (May 1980)

    Google Scholar 

  11. Knorr, E., Ng, R., Tucakov, V.: Distance-based outlier: algorithms and applications. VLDB Journal 8(3-4), 237–253 (2000)

    Article  Google Scholar 

  12. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proc. Int. Conf. on Management of Data (SIGMOD), pp. 427–438 (2000)

    Google Scholar 

  13. Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. 2(17), 203–215 (2005)

    Article  Google Scholar 

  14. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: A survey. ACM Comput. Surv. 41(3) (2009)

    Google Scholar 

  15. Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley & Sons (1994)

    Google Scholar 

  16. Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proc. Int. Conf. on Very Large Databases (VLDB 1998), pp. 392–403 (1998)

    Google Scholar 

  17. Breunig, M.M., Kriegel, H., Ng, R., Sander, J.: Lof: Identifying density-based local outliers. In: Proc. Int. Conf. on Managment of Data, SIGMOD (2000)

    Google Scholar 

  18. Jin, W., Tung, A., Han, J.: Mining top-n local outliers in large databases. In: Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining, KDD (2001)

    Google Scholar 

  19. Papadimitriou, S., Kitagawa, H., Gibbons, P., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral. In: Proc. Int. Conf. on Data Enginnering (ICDE), pp. 315–326 (2003)

    Google Scholar 

  20. Wang, B., Xiao, G., Yu, H., Yang, X.: Distance-based outlier detection on uncertain data. In: Proc. Computer and Information Technology (CIT), pp. 293–298 (2009)

    Google Scholar 

  21. Jiang, B., Pei, J.: Outlier detection on uncertain data: Objects, instances, and inference. In: Proc. Int. Conf. on Data Engineering, ICDE (2011)

    Google Scholar 

  22. Lepage, G.: A new algorithm for adaptive multidimensional integration. Journal of Computational Physics 27 (1978)

    Google Scholar 

  23. Rushdi, A.M., Al-Qasimi, A.: Efficient computation of the p.m.f. and the c.d.f. of the generalized binomial distribution. Microeletron. Reliab. 34(9), 1489–1499 (1994)

    Article  Google Scholar 

  24. Angiulli, F., Fassetti, F.: Dolphin: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans. Knowl. Disc. Data 3(1), Art. 4 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Angiulli, F., Fassetti, F. (2013). Outlier Detection with Arbitrary Probability Functions. In: Baldoni, M., Baroglio, C., Boella, G., Micalizio, R. (eds) AI*IA 2013: Advances in Artificial Intelligence. AI*IA 2013. Lecture Notes in Computer Science(), vol 8249. Springer, Cham. https://doi.org/10.1007/978-3-319-03524-6_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03524-6_36

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03523-9

  • Online ISBN: 978-3-319-03524-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics