Skip to main content

Continuous Kernel-Based Outlier Detection over Distributed Data Streams

  • Conference paper
Frontiers of High Performance Computing and Networking ISPA 2007 Workshops (ISPA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4743))

Abstract

Stream data are often transmitted over a distributed network, but in many cases, are too voluminous to be collected in a central location. Instead, we must perform distributed computations, guaranteeing high quality results in real-time even as new data arrive. In this paper, firstly, we formalize the problem of continuous outlier detection over distributed evolving data streams. Then, two novel outlier measures and algorithms are proposed which can identify outliers in a single pass. Furthermore, our experiments with synthetic and real data show that the proposed methods are both efficient and effective compared with existing outlier detection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnett, V., Lewis, T.: Outliers in statistical data, 3rd edn. Wiley, Chichester (2001)

    Google Scholar 

  2. Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: ICML (2000)

    Google Scholar 

  3. Ester, M., Kriegel, H.P., Xu, X.: A database interface for clustering in large spatial databases. In: KDD (1995)

    Google Scholar 

  4. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: SIGMOD (1996)

    Google Scholar 

  5. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB (1998)

    Google Scholar 

  6. Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: KDD (2006)

    Google Scholar 

  7. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: SIGMOD (2000)

    Google Scholar 

  8. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral. In: ICDE (2003)

    Google Scholar 

  9. Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: PAKDD (2006)

    Google Scholar 

  10. Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: ICDE (2005)

    Google Scholar 

  11. Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the divide: Continuous clustering of distributed data streams. In: ICDE (2007)

    Google Scholar 

  12. Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, Chichester (2001)

    Google Scholar 

  13. Gijbels, I., Pope, A., Wand, M.: Automatic forecasting via exponential smoothing: Asymptotic properties (1997)

    Google Scholar 

  14. Ali, S.M., Silvey, S.D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. Ser B 28, 131–142 (1966)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Parimala Thulasiraman Xubin He Tony Li Xu Mieso K. Denko Ruppa K. Thulasiram Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Su, L., Han, W., Zou, P., Jia, Y. (2007). Continuous Kernel-Based Outlier Detection over Distributed Data Streams. In: Thulasiraman, P., He, X., Xu, T.L., Denko, M.K., Thulasiram, R.K., Yang, L.T. (eds) Frontiers of High Performance Computing and Networking ISPA 2007 Workshops. ISPA 2007. Lecture Notes in Computer Science, vol 4743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74767-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74767-3_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74766-6

  • Online ISBN: 978-3-540-74767-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics