Abstract
Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Babcock B, Babu S, Datar M et al. Models and issues in data stream systems. In Proc. the 21st ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, June 2002, pp.1-16.
Tran T T, Peng L, Li B et al. PODS: A new model and processing algorithms for uncertain data streams. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.159-170.
Cao K Y, Wang G R, Han D H et al. Continuous outlier monitoring on uncertain data streams. Journal of Computer Science and Technology, 2014, 29(3): 436-448.
Zhao L, Yang Y Y, Zhou X. Continuous probabilistic subspace skyline query processing using grid projections. Journal of Computer Science and Technology, 2014, 29(2): 332-344.
Zhou A Y, Jin C Q, Wang G R et al. A survey on the management of uncertain data. Chinese Journal of Computers, 2009, 32(1): 1-16. (in Chinese)
He Q, Shang T, Zhuang F et al. Parallel extreme learning machine for regression based on MapReduce. Neurocomputing, 2013, 102: 52-58.
Aggarwal C C, Yu P S. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609-623.
Masud M M, Gao J, Khan L et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.929-934.
Xu W, Qin Z, Chang Y. A framework for classifying uncertain and evolving data streams. Information Technology Journal, 2011, 10(10): 1926-1933.
Domingos P, Hulten G. Mining high-speed data streams. In Proc. the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2000, pp.71-80.
Hulten G, Spencer L, Domingos P. Mining time-changing data streams. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2001, pp.97-106.
Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2003, pp.523-528.
Liu J, Li X, Zhong W. Ambiguous decision trees for mining concept-drifting data streams. Pattern Recognition Letters, 2009, 30(15): 1347-1355.
Gama J, Kosina P. Learning decision rules from data streams. In Proc. the 22nd International Joint Conference on Artificial Intelligence, July 2011, pp.1255-1260.
Kosina P, Gama J. Handling time changing data with adaptive very fast decision rules. In Machine Learning and Knowledge Discovery in Databases, Flach P, Bie T, Cristianini N (eds.), Springer, 2012, pp.827-842.
Frias-Blanco I, del Campo-Avila J, Ramos Jimenez G et al. Online and nonparametric drift detection methods based on Hoeffding’s bounds. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810-823.
Street W N, Kim Y. A streaming ensemble algorithm (SEA) for large-scale classification. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2001, pp.377-382.
Stanley K O. Learning concept drift with a committee of decision trees. Technical Report, UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA, 2003.
Wang H, Fan W, Yu P S et al. Mining concept-drifting data streams using ensemble classifiers. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003, pp.226-235.
Nishida K, Yamauchi K, Omori T. ACE: Adaptive classifiers-ensemble system for concept-drifting environments. In Proc. the 6th Int. Workshop on Multiple Classifier Systems, June 2005, pp.176-185.
Li P, Wu X, Hu X et al. A random decision tree ensemble for mining concept drifts from noisy data streams. Applied Artificial Intelligence, 2010, 24(7): 680-710.
Ye Y,Wu Q, Huang J Z et al. Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition, 2013, 46(3): 769-787.
Liang C, Zhang Y, Song Q. Decision tree for dynamic and uncertain data streams. In Proc. the 2nd Asian Conference on Machine Learning, November 2010, pp.209-224.
Qin B, Xia Y, Li F. DTU: A decision tree for uncertain data. In Proc. the 13th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining, April 2009, pp.4-15.
Pan S, Wu K, Zhang Y et al. Classifier ensemble for uncertain data stream classification. In Proc. the 14th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining, June 2010, pp.488-495.
Jenhani I, Amor N B, Elouedi Z. Decision trees as possibilistic classifiers. International Journal of Approximate Reasoning, 2008, 48(3): 784-807.
Liu B, Xiao Y, Cao L et al. One-class-based uncertain data stream learning. In Proc. the 11th SIAM International Conference on Data Mining, April 2011, pp.992-1003.
Cao K, Wang G, Han D et al. Classification of uncertain data streams based on extreme learning machine. Cognitive Computation, 2015, 7(1): 150-160.
Huang G B, Wang D H, Lan Y. Extreme learning machines: A survey. International Journal of Machine Learning and Cybernetics, 2011, 2(2): 107-122.
Huang G B, Babri H A. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Transactions on Neural Networks, 1998, 9(1): 224-229.
Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: Theory and applications. Neurocomputing, 2006, 70(1/2/3): 489-501.
Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China under Grant Nos. 61173029 and 61272182.
Rights and permissions
About this article
Cite this article
Han, DH., Zhang, X. & Wang, GR. Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine. J. Comput. Sci. Technol. 30, 874–887 (2015). https://doi.org/10.1007/s11390-015-1566-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-015-1566-6