Abstract
A pseudo-random generator is an algorithm to generate a sequence of objects determined by a truly random seed which is not truly random. It has been widely used in many applications, such as cryptography and simulations. In this article, we examine current popular machine learning algorithms with various on-line algorithms for pseudo-random generated data in order to find out which machine learning approach is more suitable for this kind of data for prediction based on on-line algorithms. To further improve the prediction performance, we propose a novel sample weighted algorithm that takes generalization errors in each iteration into account. We perform intensive evaluation on real Baccarat data generated by Casino machines and random number generated by a popular Java program, which are two typical examples of pseudo-random generated data. The experimental results show that support vector machine and k-nearest neighbors have better performance than others with and without sample weighted algorithm in the evaluation data set.































Similar content being viewed by others
References
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)
Barker, E., Barker, W., Burr, W., Polk, W., Smid, M.: Recommendation for key management. NIST Special Publication (2013)
Belmouhcine. A., Benkhalifa, M.: Implicit links-based techniques to enrich k-nearest neighbors and naive Bayes algorithms for web page classification. In: Proceedings of the 9th International Conference on Computer Recognition Systems, pp. 755–766 (2016)
Bhalke, D.G., Rama Rao, C.B., Bormane, D.S.: Automatic musical instrument classification using fractional Fourier transform based-MFCC features and counter propagation neural network. J. Intell. Inf. Syst. 20(5), 425–426 (2015)
Bjorck, A.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)
Breiman, L., Friedman, J.H., Olshen, A.R., Stone, C.J.: Support-Vector Networks. Wadsworth and Brooks Cole Advanced Books and Software, Monterey (1984)
Caruana, R., Caruana, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Chen, F.H., Howard, H.: An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree. Soft Comput. 20(5), 1945–1960 (2015)
Connor, J.J., Robertson, E.F.: Student’s t-test. MacTutor History of Mathematics Archive, University of St Andrews (1908)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Dasarathy, B., Los Alamitos: Nearest Neighbor (NN) Norms: Nn Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1999)
Jiang, M.W., Li, H.L.: Vehicle classification based on hierarchical support vector machine. In: Proceedings of the International Conference on Computer Engineering and Network, pp. 593–600 (2014)
Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, pp. 99–108 (2005)
Kumar, S., Sahoo, G.: Classification of heart disease using Naive Bayes and genetic algorithm. In: Proceedings of the International Conference on CIDM, pp. 269–282 (2014)
Li, D.G., Liu, X.B., Zhao, J.M.: An approach for J wave auto-detection based on support vector machine. In: Big Data Computing and Communications, pp. 435–461 (2015)
Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst. Appl. 39, 11303–11311 (2012)
Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)
Littlestone, N.: Mistake bounds and logarithmic linear-threshold learning algorithms. Technical report UCSC-CRL-89-11 (1989)
Mohri, M., Rostamizadeh, A., Talwalker, A.: Foundations of Machine Learning. MIT, Cambridge (2012)
Nello, C., John, S.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Prakash, V.J., Nithya, L.M.: A survey on semi-supervised learning techniques. Int. J. Comput. Trends Technol. 8(1), 25–29 (2014)
Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. Proc. ICML 98, 445–453 (1998)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Rosenblatt, F.: The perceptron—a perceiving and recognizing automaton. Report 85-460-1 (1957)
von Neumann, J.: Various Techniques Used in Connection with Random Digits. Applied Mathematics Series, pp. 36–38. U.S. Government Printing Office, Washington (1951)
Wang, S.S., Jiang, L.X., Li, C.Q.: Adapting Naive Bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2015)
Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: Proceedings of WESCON Convention, pp. 96–140 (1960)
Yeung, D.S., Chan, P.P.K.: A novel dynamic fusion method using localized generalization error model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 623–628 (2009)
Zhu, J., Yang, Y., Xie, Q., Wang, L., Hassan, S.: Robust hybrid name disambiguation framework for large databases. Scientometrics 98, 2255–2274 (2014)
Acknowledgments
This work was supported by the Youth Teacher Startup Fund of South China Normal University (No. 14KJ18), the Natural Science Foundation of Guangdong Province, China (No. 2015A030310509), the National Natural Science Foundation of China(61370229,61272067), the National Key Technology R&D Program of China (No. 2014BAH28F02) and the S&T Projects of Guangdong Province (Nos. 2014B010103004, 2014B010117007, 2015A030401087, 2015B010110002, 2016B030305004, 2016A030303055 and 2016B010109008).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, J., Xu, C., Li, Z. et al. An examination of on-line machine learning approaches for pseudo-random generated data. Cluster Comput 19, 1309–1321 (2016). https://doi.org/10.1007/s10586-016-0586-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0586-5