Skip to main content
Log in

Efficient mining of high-speed uncertain data streams

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Currently available algorithms for data streams classification are mostly designed to deal with precise and complete data. However, data in many real-life applications is naturally uncertain due to inherent instrument inaccuracy, wireless transmission error, and so on. We propose UELM-MapReduce, a parallel ensemble classifier based on Extreme Learning Machine (ELM) and MapReduce for handling uncertain data streams. We train an efficient parallel ELM-based ensemble classifier from sequential training chunks of the uncertain data streams. The weight of each base classifier in the ensemble is adjusted according to its mean square error on the up-to-date test chunk, and the classifier with the lowest accuracy is replaced. UELM-MapReduce can classify uncertain data streams with both efficiency and accuracy while effectively handling concept drift. Experimental results demonstrate that UELM-MapReduce has better performance than other methods in prediction accuracy and computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data streams systems.. In: Proceedings of the 21 st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 1–16

  2. Aggarwal CC (2007) Data streams: models and algorithms. Advances in Database Systems, vol 31. Springer, New York

  3. Gupta N, Rajput I (2013) Stream data ming: a survey. Int J Eng Res Appl 3(1):113–118

    Google Scholar 

  4. Aggarwal CC, Yu PS (2008) A framework for clustering uncertain data streams. In: Proceedings of the 24 th international conference on data engineering, pp 150–159

  5. Cheng R, Kalashnikov DV, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data.. In: Proceedings of the 22 th ACM SIGMOD international conference on management of data / principles of database systems, pp 551–562

  6. Pan S, Wu K, Zhang Y, Li X (2010) Classifier ensemble for uncertain data stream classification.. In: Proceedings of the 14 th Pacific-Asia conference on knowledge discovery and data mining, pp 488–495

  7. Zhang C, Gao M, Zhou A (2009) Tracking high quality cluster over uncertain data streams.. In: Proceedings of the 25 th IEEE international conference on data engineering , pp 1641–1648

  8. Cheng R (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 16 (9):1112–1127

    Article  Google Scholar 

  9. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(13):489–501

    Article  Google Scholar 

  10. Qin B, Xia Y, Prabhakar S, Tu Y (2009) A rule-based classification algorithm for uncertain data.. In: Proceedings of the 25 th IEEE international conference on data engineering , pp 1633–1640

  11. Tsang S, Kao B, Yip KY, Ho W, Lee SD (2009) Decision trees for uncertain data.. In: In Proceedings of the 25 th IEEE international conference on data engineering, pp 441–444

  12. Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data.. In: Proceedings of the 13 th Pacific-Asia conference on knowledge discovery and data mining, pp 4–15

  13. Jenhani I, Amor NB, Elouedi Z (2008) Decision trees as possibilistic classifiers. Int J Approx Reason 48(3):784–807

    Article  Google Scholar 

  14. Bi J, Zhang T (2004) Support vector machines with input data uncertainty.. In: Proceedings of the 18 th advances in neural information processing systems

  15. Ge J, Xia Y, Nadungodage CH (2010) UNN: a neural network for uncertain data classification.. In: Proceedings of the 14 th Pacific-Asia conference on knowledge discovery and data mining, pp 449–460

  16. Gao C, Wang J (2010) Direct mining of discriminative patterns for classifying uncertain data.. In: Proceedings of the 16 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 861–870

  17. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams.. In: Proceedings of the 7 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 97–106

  18. Liu J, Li X, Zhang W (2009) Ambiguous decision trees for mining concept-drifting data streams. Pattern Recog Lett 30(15):1347–1355

    Article  MathSciNet  Google Scholar 

  19. Gama J, Kosina P (2011) Learning Decision rules from data streams.. In: Proceedings of the 22 nd international joint conference on artificial intelligence, pp 1255–1260

  20. Gama J, Kosina P (2012) Handling time changing data with adaptive very fast decision rules.. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases, vol 7523, pp 827– 842

  21. Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification.. In: Proceedings of the 7 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382

  22. Albert B, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams.. In: Proceedings of the 15 author=th, ACM SIGKDD international conference on knowledge discovery and data mining, pp 139–148

  23. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers.. In: In Proceedings of the 9 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235

  24. Domingos P, Hulten G (2001) Mining high-speed data streams.. In: Proceedings of the 6 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 71–80

  25. Liang C, Zhang Y, Song Q (2010) Decision tree for dynamic and uncertain data streams.. In: Proceedings of the 2 nd Asian conference on machine learning, pp 209–224

  26. Liang C, Yang Z, Peng S, Hu Z (2012) Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. J Inf Sci 213:50–67

    Article  Google Scholar 

  27. Liu B, Xiao Y, Cao L, Yu PS (2011) On class-based uncertain data stream learning.. In: Proceedings of the SIAM international conference on data mining, pp 992–1003

  28. Xu W, Qin Z, Chang Y (2011) A framework for classifying uncertain and evolving data streams. Inf Technol J 10(10):1926–1933

    Article  Google Scholar 

  29. Pan S, Wu K, Zhang Y, Li X (2010) Classifier ensemble for uncertain data stream classification.. In: Proceedings of the 14 th Pacific-Asia conference on knowledge discovery and data mining. May 24-27, pp 209–224

  30. Cao K, Wang G, Han D (2013) Classification of uncertain data streams based on extreme learning machine.. In: Proceedings of the 4 th international conference on extreme learning machine

  31. Chen J, Zheng G, Chen H (2013) ELM-MapReduce: MapReduce accelerated extreme learning machine for big spatial data analysis.. In: Proceedings of the 10 th international conference on control and automation, pp 400–405

  32. http://en.wikipedia.org/wiki/MOA_(Massive_Online_Analysis)

  33. Bache K, Lichman M (2013) UCI Machine Learning Repository [ http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  34. Cambria E, Huang GB (2013) Extreme learning machines. IEEE Intelligent Systems, November/December, pp. 30-59.

Download references

Acknowledgments

This work is supported by the National Nature Science Foundation of China (No. 61173029, 6122182). It was performed while the first author was a Visiting Scholar at Brigham Young University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christophe Giraud-Carrier.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, D., Giraud-Carrier, C. & Li, S. Efficient mining of high-speed uncertain data streams. Appl Intell 43, 773–785 (2015). https://doi.org/10.1007/s10489-015-0675-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0675-9

Keywords

Navigation