skip to main content
research-article

Change detection in streaming data in the era of big data: models and issues

Published: 25 September 2014 Publication History

Abstract

Big Data is identified by its three Vs, namely velocity, volume, and variety. The area of data stream processing has long dealt with the former two Vs velocity and volume. Over a decade of intensive research, the community has provided many important research discoveries in the area. The third V of Big Data has been the result of social media and the large unstructured data it generates. Streaming techniques have also been proposed recently addressing this emerging need. However, a hidden factor can represent an important fourth V, that is variability or change. Our world is changing rapidly, and accounting to variability is a crucial success factor. This paper provides a survey of change detection techniques as applied to streaming data. The review is timely with the rise of Big Data technologies, and the need to have this important aspect highlighted and its techniques categorized and detailed.

References

[1]
C. Aggarwal. A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 575--586. ACM New York, NY, USA, 2003.
[2]
C. Aggarwal. On change diagnosis in evolving data streams. IEEE Transactions on Knowledge and Data Engineering, pages 587--600, 2005.
[3]
C. Aggarwal. A segment-based framework for modeling and mining data streams. Knowledge and information systems, 30(1):1--29, 2012.
[4]
C. Aggarwal and P. Yu. A survey of synopsis construction in data streams. Data streams: models and algorithms, page 169, 2007.
[5]
A. Bondu and M. Boull©. A supervised approach for change detection in data streams. In Neural Networks (IJCNN), The 2011 International Joint Conference on, pages 519--526. IEEE, 2011.
[6]
S. Boriah, V. Kumar, M. Steinbach, C. Potter, and S. Klooster. Land cover change detection: a case study. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 857--865. ACM, 2008.
[7]
G. Cabanes and Y. Bennani. Change detection in data streams through unsupervised learning. In Neural Networks (IJCNN), The 2012 International Joint Conference on, pages 1--6. IEEE, 2012.
[8]
J. Chang and W. Lee. estwin: adaptively monitoring the recent change of frequent itemsets over online data streams. In Proceedings of the twelfth international conference on Information and knowledge management, pages 536--539. ACM, 2003.
[9]
K. Chen and L. Liu. HE-Tree: a framework for detecting changes in clustering structure for categorical data streams. The VLDB Journal, pages 1--20.
[10]
T. CHEN, C. YUAN, A. SHEIKH, and C. NEUBAUER. Segment-based change detection method in multivariate data stream, Apr. 9 2009. WO Patent WO/2009/045,312.
[11]
G. Cormode. The continuous distributed monitoring model. SIGMOD Record, 42(1):5, 2013.
[12]
G. Cormode and M. Garofalakis. Efficient strategies for continuous distributed tracking tasks. IEEE Data Engineering Bulletin, 28(1):33--39, 2005.
[13]
K. Das, K. Bhaduri, S. Arora, W. Griffin, K. Borne, C. Giannella, and H. Kargupta. Scalable Distributed Change Detection from Astronomy Data Streams using Local, Asynchronous Eigen Monitoring Algorithms. In SIAM International Conference on Data Mining, Nevada, 2009.
[14]
T. Dasu, S. Krishnan, D. Lin, S. Venkatasubramanian, and K. Yi. Change (Detection) You Can Believe in: Finding Distributional Shifts in Data Streams. In Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII, page 34. Springer, 2009.
[15]
T. Dasu, S. Krishnan, S. Venkatasubramanian, and K. Yi. An information-theoretic approach to detecting changes in multi-dimensional data streams. In 38th Symposium on the Interface of Statistics, Computing Science, and Applications. Citeseer, 2005.
[16]
S. Datta, K. Bhaduri, C. Giannella, R. Wolff, and H. Kargupta. Distributed data mining in peer-to-peer networks. IEEE Internet Computing, pages 18--26, 2006.
[17]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[18]
N. Dindar, P. M. Fischer, M. Soner, and N. Tatbul. Efficiently correlating complex events over live and archived data streams. In ACM DEBS Conference, 2011.
[19]
G. Dong, J. Han, L. Lakshmanan, J. Pei, H. Wang, and P. Yu. Online mining of changes from data streams: Research problems and preliminary results. Citeseer.
[20]
A. R. Ganguly, J. Gama, O. A. Omitaomu, M. M. Gaber, and R. R. Vatsavai. Knowledge discovery from sensor data, volume 7. CRC, 2008.
[21]
V. Ganti, J. Gehrke, and R. Ramakrishnan. A framework for measuring changes in data characteristics. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 126--137. ACM, 1999.
[22]
V. Ganti, J. Gehrke, R. Ramakrishnan, and W. Loh. A framework for measuring differences in data characteristics. Journal of Computer and System Sciences, 64(3):542--578, 2002.
[23]
S. Geisler, C. Quix, and S. Schiffer. A data stream-based evaluation framework for traffic information systems. In Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming, pages 11--18. ACM, 2010.
[24]
L. Golab, T. Johnson, J. S. Seidel, and V. Shkapenyuk. Stream warehousing with datadepot. In Proceedings of the 35th SIGMOD international conference on Management of data, pages 847--854. ACM, 2009.
[25]
A. J. Hey, S. Tansley, and K. M. Tolle. The fourth paradigm: data-intensive scientific discovery. Microsoft Research Redmond, WA, 2009.
[26]
S. Hido, T. Idé, H. Kashima, H. Kubo, and H. Matsuzawa. Unsupervised change analysis using supervised learning. In Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining, pages 148--159. Springer-Verlag, 2008.
[27]
S. Ho and H.Wechsler. Detecting changes in unlabeled data streams using martingale. In Proceedings of the 20th international joint conference on Artifical intelligence, pages 1912--1917. Morgan Kaufmann Publishers Inc., 2007.
[28]
W. Huang, E. Omiecinski, and L. Mark. Evolution in Data Streams. 2003.
[29]
W. Huang, E. Omiecinski, L. Mark, and M. Nguyen. History guided low-cost change detection in streams. Data Warehousing and Knowledge Discovery, pages 75--86, 2009.
[30]
E. Ikonomovska, J. Gama, R. Sebastião, and D. Gjorgjevik. Regression trees from data streams with drift detection. In Discovery Science, pages 121--135. Springer, 2009.
[31]
M. Karnstedt, D. Klan, C. Pölitz, K.-U. Sattler, and C. Franke. Adaptive burst detection in a stream engine. In Proceedings of the 2009 ACM symposium on Applied Computing, pages 1511--1515. ACM, 2009.
[32]
Y. Kawahara and M. Sugiyama. Change-point detection in time-series data by direct density-ratio estimation. In Proceedings of 2009 SIAM International Conference on Data Mining (SDM2009), pages 389--400, 2009.
[33]
D. Kifer, S. Ben-David, and J. Gehrke. Detecting change in data streams. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, page 191. VLDB Endowment, 2004.
[34]
A. Kim, C. Marzban, D. Percival, and W. Stuetzle. Using labeled data to evaluate change detectors in a multivariate streaming environment. Signal Processing, 89(12):2529--2536, 2009.
[35]
B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen. Sketchbased change detection: Methods, evaluation, and applications. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, pages 234--247. ACM New York, NY, USA, 2003.
[36]
L. Kuncheva. Change detection in streaming multivariate data using likelihood detectors. Knowledge and Data Engineering, IEEE Transactions on, (99):1--1, 2011.
[37]
X. Liu, X. Wu, H. Wang, R. Zhang, J. Bailey, and K. Ramamohanarao. Mining distribution change in stock order streams. Prof. of ICDE, pages 105--108, 2010.
[38]
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases, pages 346--357. VLDB Endowment, 2002.
[39]
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. Byers. Big data: The next frontier for innovation, competition and productivity. McKinsey Global Institute, May, 2011.
[40]
A. Maslov, M. Pechenizkiy, T. Kärkkäinen, and M. Tähtinen. Quantile index for gradual and abrupt change detection from cfb boiler sensor data in online settings. In Proceedings of the Sixth InternationalWorkshop on Knowledge Discovery from Sensor Data, pages 25--33. ACM, 2012.
[41]
S. Muthukrishnan. Data streams: Algorithms and applications. Now Publishers Inc, 2005.
[42]
S. Muthukrishnan, E. van den Berg, and Y. Wu. Sequential change detection on data streams. ICDM Workshops, 2007.
[43]
M. Naor and L. Stockmeyer. What can be computed locally? pages 184--193, 1993.
[44]
W. Ng and M. Dash. A change detector for mining frequent patterns over evolving data streams. In Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on, pages 2407--2412. IEEE, 2008.

Cited By

View all
  • (2021)Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter DataBig Data and Cognitive Computing10.3390/bdcc50400465:4(46)Online publication date: 22-Sep-2021
  • (2021)Mille Cheval: a GPU-based in-memory high-performance computing framework for accelerated processing of big-data streamsThe Journal of Supercomputing10.1007/s11227-020-03508-377:7(6936-6960)Online publication date: 1-Jul-2021
  • (2020)More on Pipelined Dynamic Scheduling of Big Data StreamsApplied Sciences10.3390/app1101006111:1(61)Online publication date: 23-Dec-2020
  • Show More Cited By

Index Terms

  1. Change detection in streaming data in the era of big data: models and issues

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM SIGKDD Explorations Newsletter
        ACM SIGKDD Explorations Newsletter  Volume 16, Issue 1
        Special issue on big data
        June 2014
        63 pages
        ISSN:1931-0145
        EISSN:1931-0153
        DOI:10.1145/2674026
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 September 2014
        Published in SIGKDD Volume 16, Issue 1

        Check for updates

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)15
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 20 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)Uncovering Active Communities from Directed Graphs on Distributed Spark Frameworks, Case Study: Twitter DataBig Data and Cognitive Computing10.3390/bdcc50400465:4(46)Online publication date: 22-Sep-2021
        • (2021)Mille Cheval: a GPU-based in-memory high-performance computing framework for accelerated processing of big-data streamsThe Journal of Supercomputing10.1007/s11227-020-03508-377:7(6936-6960)Online publication date: 1-Jul-2021
        • (2020)More on Pipelined Dynamic Scheduling of Big Data StreamsApplied Sciences10.3390/app1101006111:1(61)Online publication date: 23-Dec-2020
        • (2020)Nonparametric Analysis of Tracking Data in the Context of COVID-19 PandemicBig Data Analytics and Artificial Intelligence Against COVID-19: Innovation Vision and Approach10.1007/978-3-030-55258-9_3(35-50)Online publication date: 13-Oct-2020
        • (2020)Exploiting Pattern Set Dissimilarity for Detecting Changes in Communication NetworksComplex Pattern Mining10.1007/978-3-030-36617-9_9(137-152)Online publication date: 15-Jan-2020
        • (2019)Evolving rule-based classifiers with genetic programming on GPUs for drifting data streamsPattern Recognition10.1016/j.patcog.2018.10.02487(248-268)Online publication date: Mar-2019
        • (2019)Temporal density extrapolation using a dynamic basis approachData Mining and Knowledge Discovery10.1007/s10618-019-00636-033:5(1323-1356)Online publication date: 1-Sep-2019
        • (2018)Proactive dynamic virtual-machine consolidation for energy conservation in cloud data centresJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-018-0111-x7:1(1-28)Online publication date: 1-Dec-2018
        • (2018)Reordering tests for faster test suite executionProceedings of the 40th International Conference on Software Engineering: Companion Proceeedings10.1145/3183440.3195048(442-443)Online publication date: 27-May-2018
        • (2018)Collective Anomaly Detection Using Big Data Distributed Stream Analytics2018 14th International Conference on Semantics, Knowledge and Grids (SKG)10.1109/SKG.2018.00035(188-195)Online publication date: Sep-2018
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media