A grid density based framework for classifying streaming data in the presence of concept drift

Sethi, Tegjyot Singh; Kantardzic, Mehmed; Hu, Hanquing

doi:10.1007/s10844-015-0358-3

A grid density based framework for classifying streaming data in the presence of concept drift

Published: 09 May 2015

Volume 46, pages 179–211, (2016)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Tegjyot Singh Sethi¹,
Mehmed Kantardzic² &
Hanquing Hu¹

719 Accesses
6 Altmetric
Explore all metrics

Abstract

Mining data streams is the process of extracting information from non-stopping, rapidly flowing data records to provide knowledge that is reliable and timely. Streaming data algorithms need to be one pass and operate under strict limitations of memory and response time. In addition, the classification of streaming data requires learning in an environment where the data characteristics might change constantly. Many of the classification algorithms presented in literature assume a 100 % labeling rate, which is impractical and expensive when data records are rapidly flowing in. In this paper, a new incremental grid density based learning framework, the GC3 framework, is proposed to perform classification of streaming data with concept drift and limited labeling. The proposed framework uses grid density clustering to detect changes in the input data space. It maintains an evolving ensemble of classifiers to learn and adapt to the model changes over time. The framework also uses a uniform grid density sampling mechanism to obtain a uniform subset of samples for better classification performance with a lower labeling rate. The entire framework is designed to be one-pass, incremental and work with limited memory to perform any-time classification on demand. Experimental comparison with state of the art concept drift handling systems demonstrate the GC3 frameworks ability to provide high classification performance, using fewer models in the ensemble and with only 4-6 % of the samples labeled. The results show that the GC3 framework is effective and attractive for use in real world data stream classification applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Adaptive Real-Time Classification for Data Streams with Concept Drift

Scalable concept drift adaptation for stream data mining

Article Open access 20 June 2024

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

Article 20 November 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bache, K, & Lichman, M (2013). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science, available online at http://archive.ics.uci.edu/ml.
Borchani, H, Larrañaga, P, & Bielza, C (2011). Classifying evolving data streams with partially labeled data. Intelligent Data Analysis, 15(5), 655–670.
Google Scholar
Cao, F, Ester, M, Qian, W, & Zhou, A (2006). Density-based clustering over an evolving data stream with noise. In Proceedings of the 2006 SIAM international conference on data mining (pp. 328–339).
Chen, S, & He, H (2011). Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Systems, 2(1), 35–50.
Article Google Scholar
Chen, Y, & Tu, L (2007). Density-based clustering for real-time stream data. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). ACM.
Dean, J, & Ghemawat, S (2008). Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.
Article Google Scholar
Farid, DM, Zhang, L, Hossain, A, Rahman, CM, Strachan, R, Sexton, G, & Dahal, K (2013). An adaptive ensemble classifier for mining concept drifting data streams. Expert Systems with Applications, 40(15), 5895–5906.
Article Google Scholar
Gama, J, Medas, P, Castillo, G, & Rodrigues, P (2004). Learning with drift detection. In: Advances in artificial intelligence–SBIA 2004 (pp. 286–295), Springer.
Gama, J, Rodrigues, P P, & Sebastião, R (2009). Evaluating algorithms that learn from data streams. In Proceedings of the 2009 ACM symposium on applied Computing (pp. 1496–1500). ACM.
Gao, J, Fan, W, & Han, J (2007a). On appropriate assumptions to mine data streams: Analysis and practice. In Proceedings of the seventh IEEE international conference on data mining (ICDM’07) (pp. 143–152) IEEE.
Gao, J, Fan, W, Han, J, & Philip, S Y (2007b). A general framework for mining concept-drifting data streams with skewed distributions. In Proceedings of the 7th int conf on data mining. Philadelphia: SIAM .
Gong-De, G, Nan, L, & Li-Fei, C (2012). Classification for concept-drifting data streams with limited amount of labeled data. In International conference on automatic control and artificial intelligence (ACAI 2012) (pp. 638–644). IET.
Harries, M, & Wales, NS (1999). Splice-2 comparative evaluation: electricity pricing.
Hoens, TR, Polikar, R, & Chawla, NV (2012). Learning from streaming data with concept drift and imbalance: an overview. Progress in Artificial Intelligence, 1(1), 89–101.
Article Google Scholar
Hu, H, Kantardzic, M M, & Sethi, TS (2013). Selecting samples for labeling in unbalanced streaming data environments. In 2013 XXIV international symposium on information, communication and automation technologies (ICAT) (pp. 1–7). IEEE.
Jackowski, K, & Wozniak, M (2009). Adaptive splitting and selection method of classifier ensemble building. In Hybrid artificial intelligence systems (pp. 525–532). Springer.
Kantardzic, M (2011). Data mining: concepts, models, methods, and algorithms. Wiley .
Kantardzic, M, Ryu, JW, & Walgampaya, C (2010). Building a new classifier in an ensemble using streaming unlabeled data. In Trends in applied intelligent systems (pp. 77–86). Springer.
Katakis, I, Tsoumakas, G, & Vlahavas, IP (2008). An ensemble of classifiers for coping with recurring contexts in data streams. In ECAI (pp. 763–764).
Kolter, JZ, & Maloof, MA (2007). Dynamic weighted majority: an ensemble method for drifting concepts. The Journal of Machine Learning Research, 8, 2755–2790.
MATH Google Scholar
Kong, X, & Yu, P (2011). An ensemble-based approach to fast classification of multi-label data streams. In 7th international conference on collaborative computing: networking, applications and worksharing (pp. 95–104). IEEE.
Kuncheva, L I (2000). Clustering-and-selection model for classifier combination. In 2000 Proceedings fourth international conference on knowledge-based intelligent engineering systems and allied technologies (Vol. 1, pp. 185–188). IEEE.
Kuncheva, LI (2004). Classifier ensembles for changing environments. In Multiple classifier systems (pp. 1–15). Springer.
Littlestone, N, & Warmuth, MK (1989). The weighted majority algorithm. In 30th annual symposium on foundations of computer science (pp. 256–261). IEEE.
Masud, MM, Gao, J, Khan, L, Han, J, & Thuraisingham, B (2008). A practical approach to classify evolving data streams: training with limited amount of labeled data. In Eighth IEEE international conference on data mining (ICDM’08) (pp. 929–934). IEEE.
Masud, MM, Al-Khateeb, TM, Khan, L, Aggarwal, C, Gao, J, Han, J, & Thuraisingham, B (2011a). Detecting recurring and novel classes in concept-drifting data streams. In: 2011 IEEE 11th international conference on data mining (ICDM) (pp. 1176–1181). IEEE.
Masud, MM, Gao, J, Khan, L, Han, J, & Thuraisingham, B (2011b). Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering, 23(6), 859–874.
Article Google Scholar
Masud, M M, Chen, Q, Khan, L, Aggarwal, C C, Gao, J, Han, J, Srivastava, A, & Oza, N C (2013). Classification and adaptive novel class detection of feature-evolving data streams. IEEE Transactions on Knowledge and Data Engineering, 25(7), 1484–1497.
Article Google Scholar
MATLAB (2012). version (R2012a). The MathWorks Inc., Natick, Massachusetts.
Qin, X, Zhang, Y, Li, C, & Li, X (2013). Learning from data streams with only positive and unlabeled data. Journal of Intelligent Information Systems, 40(3), 405–430. doi:10.1007/s10844-012-0231-6.
Article Google Scholar
Quinlan, JR (1996). Bagging, boosting, and C4.5. In AAAI/IAAI (Vol. 1, pp. 725–730).
Richards, G, & Wang, W (2012). What influences the accuracy of decision tree ensembles Journal of Intelligent Information Systems, 39 (3), 627–650. doi:10.1007/s10844-012-0206-7.
Article Google Scholar
Rokach, L (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33 (1–2), 1–39.
Article Google Scholar
Ryu, J W, Kantardzic, M, & Walgampaya, C (2010). Ensemble classifier based on misclassified streaming data. In Proceedings of the 10th IASTED int. conf on artificial intelligence and applications (pp. 347–354). Austria.
Ryu, JW, Kantardzic, MM, & Kim, MW (2012a). Efficiently maintaining the performance of an ensemble classifier in streaming data. In Convergence and hybrid information technology (pp. 533–540). Springer.
Ryu, JW, Kantardzic, MM, Kim, MW, & Khil, AR (2012b). An efficient method of building an ensemble of classifiers in streaming data. In Big data analytics (pp. 122–133). Berlin Heidelberg: Springer .
Street, WN, & Kim, Y (2001). A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 377–382). ACM.
Sun, X, & Jiao, Y C (2009). pGrid: Parallel grid-based data stream clustering with mapreduce. Tech. rep., Oak Ridge National Laboratory.
Surowiecki, J (2005). The wisdom of crowds. Random House Digital Inc.
Tsoumakas, G, Partalas, I, & Vlahavas, I (2009). An ensemble pruning primer. In Applications of supervised and unsupervised ensemble methods (pp. 1–13). Springer.
Tsymbal, A (2004). The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin.
Tu, L, & Chen, Y (2009). Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(3), 12.
Article Google Scholar
Wan, L, Ng, WK, Dang, XH, Yu, PS, & Zhang, K (2009). Density-based clustering of data streams at multiple resolutions. ACM Transactions on Knowledge Discovery from Data (TKDD), 3(3), 14.
Article Google Scholar
Wang, H, Fan, W, Yu, PS, & Han, J (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 226–235). ACM.
Woolam, C, Masud, MM, & Khan, L (2009). Lacking labels in the stream: classifying evolving stream data with few labels. In: Foundations of intelligent systems (pp. 552–562). Springer.
Wozniak, M, Kasprzak, A, & Cal, P (2013). Weighted aging classifier ensemble for the incremental drifted data streams. In Larsen, H, Martin-Bautista, M, Vila, M, Andreasen, T, & Christiansen, H (Eds.) Flexible query answering systems, lecture notes in computer science, (Vol. 8132 pp. 579–588). Berlin Heidelberg: Springer. doi:10.1007/978-3-642-40769-7_50.
Chapter Google Scholar
Zhang, C, & Ma, Y (2012). Ensemble machine learning: methods and applications. Springer.
Zhao, Y, Cao, J, Zhang, C, & Zhang, S (2011). Enhancing grid-density based clustering for high dimensional data. Journal of Systems and Software, 84(9), 1524–1539.
Article Google Scholar
Zliobaite, I (2009). Learning under concept drift: an overview. Tech. rep., Technical report, Vilnius University, 2009 techniques, related areas, applications Subjects: Artificial Intelligence.

Download references

Author information

Authors and Affiliations

Data Mining Lab, J.B. Speed School of Engineering, University of Louisville, Louisville, KY, USA
Tegjyot Singh Sethi & Hanquing Hu
Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY, USA
Mehmed Kantardzic

Authors

Tegjyot Singh Sethi
View author publications
You can also search for this author in PubMed Google Scholar
Mehmed Kantardzic
View author publications
You can also search for this author in PubMed Google Scholar
Hanquing Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tegjyot Singh Sethi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sethi, T.S., Kantardzic, M. & Hu, H. A grid density based framework for classifying streaming data in the presence of concept drift. J Intell Inf Syst 46, 179–211 (2016). https://doi.org/10.1007/s10844-015-0358-3

Download citation

Received: 26 February 2014
Revised: 13 April 2015
Accepted: 13 April 2015
Published: 09 May 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10844-015-0358-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A grid density based framework for classifying streaming data in the presence of concept drift

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Adaptive Real-Time Classification for Data Streams with Concept Drift

Scalable concept drift adaptation for stream data mining

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A grid density based framework for classifying streaming data in the presence of concept drift

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast Adaptive Real-Time Classification for Data Streams with Concept Drift

Scalable concept drift adaptation for stream data mining

Concept learning using one-class classifiers for implicit drift detection in evolving data streams

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation