Abstract
Currently available algorithms for data streams classification are mostly designed to deal with precise and complete data. However, data in many real-life applications is naturally uncertain due to inherent instrument inaccuracy, wireless transmission error, and so on. We propose UELM-MapReduce, a parallel ensemble classifier based on Extreme Learning Machine (ELM) and MapReduce for handling uncertain data streams. We train an efficient parallel ELM-based ensemble classifier from sequential training chunks of the uncertain data streams. The weight of each base classifier in the ensemble is adjusted according to its mean square error on the up-to-date test chunk, and the classifier with the lowest accuracy is replaced. UELM-MapReduce can classify uncertain data streams with both efficiency and accuracy while effectively handling concept drift. Experimental results demonstrate that UELM-MapReduce has better performance than other methods in prediction accuracy and computational efficiency.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data streams systems.. In: Proceedings of the 21 st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 1–16
Aggarwal CC (2007) Data streams: models and algorithms. Advances in Database Systems, vol 31. Springer, New York
Gupta N, Rajput I (2013) Stream data ming: a survey. Int J Eng Res Appl 3(1):113–118
Aggarwal CC, Yu PS (2008) A framework for clustering uncertain data streams. In: Proceedings of the 24 th international conference on data engineering, pp 150–159
Cheng R, Kalashnikov DV, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data.. In: Proceedings of the 22 th ACM SIGMOD international conference on management of data / principles of database systems, pp 551–562
Pan S, Wu K, Zhang Y, Li X (2010) Classifier ensemble for uncertain data stream classification.. In: Proceedings of the 14 th Pacific-Asia conference on knowledge discovery and data mining, pp 488–495
Zhang C, Gao M, Zhou A (2009) Tracking high quality cluster over uncertain data streams.. In: Proceedings of the 25 th IEEE international conference on data engineering , pp 1641–1648
Cheng R (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 16 (9):1112–1127
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(13):489–501
Qin B, Xia Y, Prabhakar S, Tu Y (2009) A rule-based classification algorithm for uncertain data.. In: Proceedings of the 25 th IEEE international conference on data engineering , pp 1633–1640
Tsang S, Kao B, Yip KY, Ho W, Lee SD (2009) Decision trees for uncertain data.. In: In Proceedings of the 25 th IEEE international conference on data engineering, pp 441–444
Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data.. In: Proceedings of the 13 th Pacific-Asia conference on knowledge discovery and data mining, pp 4–15
Jenhani I, Amor NB, Elouedi Z (2008) Decision trees as possibilistic classifiers. Int J Approx Reason 48(3):784–807
Bi J, Zhang T (2004) Support vector machines with input data uncertainty.. In: Proceedings of the 18 th advances in neural information processing systems
Ge J, Xia Y, Nadungodage CH (2010) UNN: a neural network for uncertain data classification.. In: Proceedings of the 14 th Pacific-Asia conference on knowledge discovery and data mining, pp 449–460
Gao C, Wang J (2010) Direct mining of discriminative patterns for classifying uncertain data.. In: Proceedings of the 16 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 861–870
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams.. In: Proceedings of the 7 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 97–106
Liu J, Li X, Zhang W (2009) Ambiguous decision trees for mining concept-drifting data streams. Pattern Recog Lett 30(15):1347–1355
Gama J, Kosina P (2011) Learning Decision rules from data streams.. In: Proceedings of the 22 nd international joint conference on artificial intelligence, pp 1255–1260
Gama J, Kosina P (2012) Handling time changing data with adaptive very fast decision rules.. In: Proceedings of the European conference on machine learning and principles and practice of knowledge discovery in databases, vol 7523, pp 827– 842
Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification.. In: Proceedings of the 7 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382
Albert B, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009) New ensemble methods for evolving data streams.. In: Proceedings of the 15 author=th, ACM SIGKDD international conference on knowledge discovery and data mining, pp 139–148
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers.. In: In Proceedings of the 9 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235
Domingos P, Hulten G (2001) Mining high-speed data streams.. In: Proceedings of the 6 th ACM SIGKDD international conference on knowledge discovery and data mining, pp 71–80
Liang C, Zhang Y, Song Q (2010) Decision tree for dynamic and uncertain data streams.. In: Proceedings of the 2 nd Asian conference on machine learning, pp 209–224
Liang C, Yang Z, Peng S, Hu Z (2012) Learning very fast decision tree from uncertain data streams with positive and unlabeled samples. J Inf Sci 213:50–67
Liu B, Xiao Y, Cao L, Yu PS (2011) On class-based uncertain data stream learning.. In: Proceedings of the SIAM international conference on data mining, pp 992–1003
Xu W, Qin Z, Chang Y (2011) A framework for classifying uncertain and evolving data streams. Inf Technol J 10(10):1926–1933
Pan S, Wu K, Zhang Y, Li X (2010) Classifier ensemble for uncertain data stream classification.. In: Proceedings of the 14 th Pacific-Asia conference on knowledge discovery and data mining. May 24-27, pp 209–224
Cao K, Wang G, Han D (2013) Classification of uncertain data streams based on extreme learning machine.. In: Proceedings of the 4 th international conference on extreme learning machine
Chen J, Zheng G, Chen H (2013) ELM-MapReduce: MapReduce accelerated extreme learning machine for big spatial data analysis.. In: Proceedings of the 10 th international conference on control and automation, pp 400–405
Bache K, Lichman M (2013) UCI Machine Learning Repository [ http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
Cambria E, Huang GB (2013) Extreme learning machines. IEEE Intelligent Systems, November/December, pp. 30-59.
Acknowledgments
This work is supported by the National Nature Science Foundation of China (No. 61173029, 6122182). It was performed while the first author was a Visiting Scholar at Brigham Young University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Han, D., Giraud-Carrier, C. & Li, S. Efficient mining of high-speed uncertain data streams. Appl Intell 43, 773–785 (2015). https://doi.org/10.1007/s10489-015-0675-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0675-9