Skip to main content

DBOD-DS: Distance Based Outlier Detection for Data Streams

  • Conference paper
Database and Expert Systems Applications (DEXA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6261))

Included in the following conference series:

Abstract

Data stream is a newly emerging data model for applications like environment monitoring, Web click stream, network traffic monitoring, etc. It consists of an infinite sequence of data points accompanied with timestamp coming from external data source. Typically data sources are located onsite and very vulnerable to external attacks and natural calamities, thus outliers are very common in the datasets. Existing techniques for outlier detection are inadequate for data streams because of its metamorphic data distribution and uncertainty. In this paper we propose an outlier detection technique, called Distance-Based Outline Detection for Data Streams (DBOD-DS) based on a novel continuously adaptive probability density function that addresses all the new issues of data streams. Extensive experiments on a real dataset for meteorology applications show the supremacy of DBOD-DS over existing techniques in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anguiulli, F., Fassetti, F.: Detecting Distance-Based Outliers in Streams of Data. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 811–820 (2007)

    Google Scholar 

  2. Babcock, B., Babu, S., Mayur, D., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proceedings of 21st ACM Symposium on Principles of Database Systems (PODS 2002), pp. 1–16 (2002)

    Google Scholar 

  3. Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowledge Information System, 137–154 (2007)

    Google Scholar 

  4. Barnett, V., Lewis, T.: Outliers in Statistical Data: Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons Inc., Chichester (1994)

    Google Scholar 

  5. Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring Streams – A New Class of Data Management Applications. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 215–226 (2002)

    Google Scholar 

  6. Chandola, V., Banarjee, A., Kumar, V.: Outlier Detection: A Survey: Technical Report, University of Minnesota (2007)

    Google Scholar 

  7. California Irrigation Management Information System, web-link, http://wwwcimis.water.ca.gov/cimis/welcome.jsp (accessed January, 2010)

  8. Curiac, D., Banias, O., Dragan, F., Volosencu, C., Dranga, O.: Malicious Node Detection in Wireless Sensor Networks Using an Autoregression Technique. In: Proceedings of the Third International Conference on Networking and Services, pp. 83–88 (2007)

    Google Scholar 

  9. Eskin, E.: Anomaly Detection over Noisy Data using Learned Probability Distributions. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 255–262 (2000)

    Google Scholar 

  10. Franke, C., Gertz, M.: Detection and Exploration of Outlier Regions in Sensor Data Streams. In: IEEE International Conference on Data Mining Workshop, pp. 375–384 (2008)

    Google Scholar 

  11. Franke, C., Gertz, M.: ORDEN: outlier region detection and exploration in sensor networks. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 1075–1078 (2009)

    Google Scholar 

  12. Fawcett, T.: Roc graphs: Notes and practical considerations for data mining researchers: Technical report hpl-2003-4, HP Laboratories, Palo Alto, CA, USA (2003)

    Google Scholar 

  13. Fan, J., James, S.J.S.: Fast Implementations of Nonparametric Curve Estimators. Journal of Computational and Graphical Statistics 3(1), 35–56 (1994)

    Article  Google Scholar 

  14. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. In: Proceedings of the 27th International Conference on Very Large Databases, pp. 79–88 (2001)

    Google Scholar 

  15. Guha, S., Koudas, N.: Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. In: Proceedings 18th International Conference on Data Engineering, pp. 567–676 (2002)

    Google Scholar 

  16. Ishida, K., Kitagawa, H.: Detecting current outliers: Continuous outlier detection over time-series data streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 255–268. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  17. Jiang, N., Gruenwald, L.: Research issues in Data Stream Association Rule Mining. ACM Sigmod Record 35(1), 14–19 (2006)

    Article  Google Scholar 

  18. Keogh, E., Lin, J., Truppel, W.: Clustering of Time Series in Meaningless: Implications for Previous and Future Research. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 56–65 (2003)

    Google Scholar 

  19. Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403 (1998)

    Google Scholar 

  20. Madsen, H.: Time Series Analysis: Texts in Statistical Science. Chapman & Hall/CRC (2007)

    Google Scholar 

  21. Ng, R.T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 144–155 (1994)

    Google Scholar 

  22. Puttagunta, V., Kalpakis, K.: Adaptive Methods for Activity Monitoring of Streaming Data. In: Proceedings of International Conference on Machine Learning and Applications, pp. 197–203 (2002)

    Google Scholar 

  23. Scott, D.W.: Multivariate Density Estimation. A Wiley-Interscience Publication, Hoboken (1992)

    Book  MATH  Google Scholar 

  24. Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 428–439 (1998)

    Google Scholar 

  25. Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 Requirements of Real-Time Stream Processing. ACM SIGMOD Record 34(4), 42–47 (2005)

    Article  Google Scholar 

  26. Subramaniam, S., Palpanas, T., Papadoppoulos, D., Kalogeraki, V., Gunopulos, D.: Online Outlier Detection in Sensor Data Using Non-Parametric Models. In: Proceedings of the 32nd International Conference on VLDB, pp. 187–198 (2006)

    Google Scholar 

  27. Gruenwald, L., Chok, H., Aboukhamis, M.: Using Data Mining to Estimate Missing Sensor Data. In: Seventh IEEE International Conference on Data Mining Workshops, pp. 207–212 (2007)

    Google Scholar 

  28. Brailsford, T.J., Penm, J.H.W., Terrell, R.D.: Selecting the forgetting factor in Subset Autoregressive Modelling. Journal of Time Series Analysis 23, 629–650 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  29. OU Supercomputer Resources: web-link, http://www.oscer.ou.edu/resources.php (accessed May, 2010)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sadik, M.S., Gruenwald, L. (2010). DBOD-DS: Distance Based Outlier Detection for Data Streams. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15364-8_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15363-1

  • Online ISBN: 978-3-642-15364-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics