Abstract
Data stream is a newly emerging data model for applications like environment monitoring, Web click stream, network traffic monitoring, etc. It consists of an infinite sequence of data points accompanied with timestamp coming from external data source. Typically data sources are located onsite and very vulnerable to external attacks and natural calamities, thus outliers are very common in the datasets. Existing techniques for outlier detection are inadequate for data streams because of its metamorphic data distribution and uncertainty. In this paper we propose an outlier detection technique, called Distance-Based Outline Detection for Data Streams (DBOD-DS) based on a novel continuously adaptive probability density function that addresses all the new issues of data streams. Extensive experiments on a real dataset for meteorology applications show the supremacy of DBOD-DS over existing techniques in terms of accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anguiulli, F., Fassetti, F.: Detecting Distance-Based Outliers in Streams of Data. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 811–820 (2007)
Babcock, B., Babu, S., Mayur, D., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proceedings of 21st ACM Symposium on Principles of Database Systems (PODS 2002), pp. 1–16 (2002)
Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowledge Information System, 137–154 (2007)
Barnett, V., Lewis, T.: Outliers in Statistical Data: Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons Inc., Chichester (1994)
Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring Streams – A New Class of Data Management Applications. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 215–226 (2002)
Chandola, V., Banarjee, A., Kumar, V.: Outlier Detection: A Survey: Technical Report, University of Minnesota (2007)
California Irrigation Management Information System, web-link, http://wwwcimis.water.ca.gov/cimis/welcome.jsp (accessed January, 2010)
Curiac, D., Banias, O., Dragan, F., Volosencu, C., Dranga, O.: Malicious Node Detection in Wireless Sensor Networks Using an Autoregression Technique. In: Proceedings of the Third International Conference on Networking and Services, pp. 83–88 (2007)
Eskin, E.: Anomaly Detection over Noisy Data using Learned Probability Distributions. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 255–262 (2000)
Franke, C., Gertz, M.: Detection and Exploration of Outlier Regions in Sensor Data Streams. In: IEEE International Conference on Data Mining Workshop, pp. 375–384 (2008)
Franke, C., Gertz, M.: ORDEN: outlier region detection and exploration in sensor networks. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 1075–1078 (2009)
Fawcett, T.: Roc graphs: Notes and practical considerations for data mining researchers: Technical report hpl-2003-4, HP Laboratories, Palo Alto, CA, USA (2003)
Fan, J., James, S.J.S.: Fast Implementations of Nonparametric Curve Estimators. Journal of Computational and Graphical Statistics 3(1), 35–56 (1994)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. In: Proceedings of the 27th International Conference on Very Large Databases, pp. 79–88 (2001)
Guha, S., Koudas, N.: Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. In: Proceedings 18th International Conference on Data Engineering, pp. 567–676 (2002)
Ishida, K., Kitagawa, H.: Detecting current outliers: Continuous outlier detection over time-series data streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 255–268. Springer, Heidelberg (2008)
Jiang, N., Gruenwald, L.: Research issues in Data Stream Association Rule Mining. ACM Sigmod Record 35(1), 14–19 (2006)
Keogh, E., Lin, J., Truppel, W.: Clustering of Time Series in Meaningless: Implications for Previous and Future Research. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 56–65 (2003)
Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 392–403 (1998)
Madsen, H.: Time Series Analysis: Texts in Statistical Science. Chapman & Hall/CRC (2007)
Ng, R.T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 144–155 (1994)
Puttagunta, V., Kalpakis, K.: Adaptive Methods for Activity Monitoring of Streaming Data. In: Proceedings of International Conference on Machine Learning and Applications, pp. 197–203 (2002)
Scott, D.W.: Multivariate Density Estimation. A Wiley-Interscience Publication, Hoboken (1992)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 428–439 (1998)
Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 Requirements of Real-Time Stream Processing. ACM SIGMOD Record 34(4), 42–47 (2005)
Subramaniam, S., Palpanas, T., Papadoppoulos, D., Kalogeraki, V., Gunopulos, D.: Online Outlier Detection in Sensor Data Using Non-Parametric Models. In: Proceedings of the 32nd International Conference on VLDB, pp. 187–198 (2006)
Gruenwald, L., Chok, H., Aboukhamis, M.: Using Data Mining to Estimate Missing Sensor Data. In: Seventh IEEE International Conference on Data Mining Workshops, pp. 207–212 (2007)
Brailsford, T.J., Penm, J.H.W., Terrell, R.D.: Selecting the forgetting factor in Subset Autoregressive Modelling. Journal of Time Series Analysis 23, 629–650 (2002)
OU Supercomputer Resources: web-link, http://www.oscer.ou.edu/resources.php (accessed May, 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sadik, M.S., Gruenwald, L. (2010). DBOD-DS: Distance Based Outlier Detection for Data Streams. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-15364-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15363-1
Online ISBN: 978-3-642-15364-8
eBook Packages: Computer ScienceComputer Science (R0)