Skip to main content

Advertisement

Log in

An examination of on-line machine learning approaches for pseudo-random generated data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

A pseudo-random generator is an algorithm to generate a sequence of objects determined by a truly random seed which is not truly random. It has been widely used in many applications, such as cryptography and simulations. In this article, we examine current popular machine learning algorithms with various on-line algorithms for pseudo-random generated data in order to find out which machine learning approach is more suitable for this kind of data for prediction based on on-line algorithms. To further improve the prediction performance, we propose a novel sample weighted algorithm that takes generalization errors in each iteration into account. We perform intensive evaluation on real Baccarat data generated by Casino machines and random number generated by a popular Java program, which are two typical examples of pseudo-random generated data. The experimental results show that support vector machine and k-nearest neighbors have better performance than others with and without sample weighted algorithm in the evaluation data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31

Similar content being viewed by others

Notes

  1. http://www.wikihow.com/Play-Baccarat.

  2. http://spark.apache.org/.

References

  1. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)

    MathSciNet  Google Scholar 

  2. Barker, E., Barker, W., Burr, W., Polk, W., Smid, M.: Recommendation for key management. NIST Special Publication (2013)

  3. Belmouhcine. A., Benkhalifa, M.: Implicit links-based techniques to enrich k-nearest neighbors and naive Bayes algorithms for web page classification. In: Proceedings of the 9th International Conference on Computer Recognition Systems, pp. 755–766 (2016)

  4. Bhalke, D.G., Rama Rao, C.B., Bormane, D.S.: Automatic musical instrument classification using fractional Fourier transform based-MFCC features and counter propagation neural network. J. Intell. Inf. Syst. 20(5), 425–426 (2015)

    Google Scholar 

  5. Bjorck, A.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)

    Book  MATH  Google Scholar 

  6. Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  7. Breiman, L., Friedman, J.H., Olshen, A.R., Stone, C.J.: Support-Vector Networks. Wadsworth and Brooks Cole Advanced Books and Software, Monterey (1984)

    Google Scholar 

  8. Caruana, R., Caruana, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning (2006)

  9. Chen, F.H., Howard, H.: An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree. Soft Comput. 20(5), 1945–1960 (2015)

    Article  Google Scholar 

  10. Connor, J.J., Robertson, E.F.: Student’s t-test. MacTutor History of Mathematics Archive, University of St Andrews (1908)

  11. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  12. Dasarathy, B., Los Alamitos: Nearest Neighbor (NN) Norms: Nn Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)

    Google Scholar 

  13. Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)

    Article  MATH  Google Scholar 

  14. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1999)

    MATH  Google Scholar 

  15. Jiang, M.W., Li, H.L.: Vehicle classification based on hierarchical support vector machine. In: Proceedings of the International Conference on Computer Engineering and Network, pp. 593–600 (2014)

  16. Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, pp. 99–108 (2005)

  17. Kumar, S., Sahoo, G.: Classification of heart disease using Naive Bayes and genetic algorithm. In: Proceedings of the International Conference on CIDM, pp. 269–282 (2014)

  18. Li, D.G., Liu, X.B., Zhao, J.M.: An approach for J wave auto-detection based on support vector machine. In: Big Data Computing and Communications, pp. 435–461 (2015)

  19. Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst. Appl. 39, 11303–11311 (2012)

    Article  Google Scholar 

  20. Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)

    Google Scholar 

  21. Littlestone, N.: Mistake bounds and logarithmic linear-threshold learning algorithms. Technical report UCSC-CRL-89-11 (1989)

  22. Mohri, M., Rostamizadeh, A., Talwalker, A.: Foundations of Machine Learning. MIT, Cambridge (2012)

    MATH  Google Scholar 

  23. Nello, C., John, S.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)

    MATH  Google Scholar 

  24. Prakash, V.J., Nithya, L.M.: A survey on semi-supervised learning techniques. Int. J. Comput. Trends Technol. 8(1), 25–29 (2014)

    Article  Google Scholar 

  25. Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. Proc. ICML 98, 445–453 (1998)

    Google Scholar 

  26. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  27. Rosenblatt, F.: The perceptron—a perceiving and recognizing automaton. Report 85-460-1 (1957)

  28. von Neumann, J.: Various Techniques Used in Connection with Random Digits. Applied Mathematics Series, pp. 36–38. U.S. Government Printing Office, Washington (1951)

  29. Wang, S.S., Jiang, L.X., Li, C.Q.: Adapting Naive Bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2015)

    Article  Google Scholar 

  30. Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: Proceedings of WESCON Convention, pp. 96–140 (1960)

  31. Yeung, D.S., Chan, P.P.K.: A novel dynamic fusion method using localized generalization error model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 623–628 (2009)

  32. Zhu, J., Yang, Y., Xie, Q., Wang, L., Hassan, S.: Robust hybrid name disambiguation framework for large databases. Scientometrics 98, 2255–2274 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Youth Teacher Startup Fund of South China Normal University (No. 14KJ18), the Natural Science Foundation of Guangdong Province, China (No. 2015A030310509), the National Natural Science Foundation of China(61370229,61272067), the National Key Technology R&D Program of China (No. 2014BAH28F02) and the S&T Projects of Guangdong Province (Nos. 2014B010103004, 2014B010117007, 2015A030401087, 2015B010110002, 2016B030305004, 2016A030303055 and 2016B010109008).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, J., Xu, C., Li, Z. et al. An examination of on-line machine learning approaches for pseudo-random generated data. Cluster Comput 19, 1309–1321 (2016). https://doi.org/10.1007/s10586-016-0586-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0586-5

Keywords

Navigation