An examination of on-line machine learning approaches for pseudo-random generated data

Zhu, Jia; Xu, Chuanhua; Li, Zhixu; Fung, Gabriel; Lin, Xueqin; Huang, Jin; Huang, Changqin

doi:10.1007/s10586-016-0586-5

An examination of on-line machine learning approaches for pseudo-random generated data

Published: 27 June 2016

Volume 19, pages 1309–1321, (2016)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Jia Zhu¹,
Chuanhua Xu¹,
Zhixu Li²,
Gabriel Fung³,
Xueqin Lin¹,
Jin Huang¹ &
…
Changqin Huang¹

472 Accesses
7 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

A pseudo-random generator is an algorithm to generate a sequence of objects determined by a truly random seed which is not truly random. It has been widely used in many applications, such as cryptography and simulations. In this article, we examine current popular machine learning algorithms with various on-line algorithms for pseudo-random generated data in order to find out which machine learning approach is more suitable for this kind of data for prediction based on on-line algorithms. To further improve the prediction performance, we propose a novel sample weighted algorithm that takes generalization errors in each iteration into account. We perform intensive evaluation on real Baccarat data generated by Casino machines and random number generated by a popular Java program, which are two typical examples of pseudo-random generated data. The experimental results show that support vector machine and k-nearest neighbors have better performance than others with and without sample weighted algorithm in the evaluation data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Driven Generation of Synthetic Data with Support Vector Data Description

An Empirical Comparison of Support Vector Machines Versus Nearest Neighbour Methods for Machine Learning Applications

A Scalable Boosting Learner Using Adaptive Sampling

Notes

References

Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)
MathSciNet Google Scholar
Barker, E., Barker, W., Burr, W., Polk, W., Smid, M.: Recommendation for key management. NIST Special Publication (2013)
Belmouhcine. A., Benkhalifa, M.: Implicit links-based techniques to enrich k-nearest neighbors and naive Bayes algorithms for web page classification. In: Proceedings of the 9th International Conference on Computer Recognition Systems, pp. 755–766 (2016)
Bhalke, D.G., Rama Rao, C.B., Bormane, D.S.: Automatic musical instrument classification using fractional Fourier transform based-MFCC features and counter propagation neural network. J. Intell. Inf. Syst. 20(5), 425–426 (2015)
Google Scholar
Bjorck, A.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Book MATH Google Scholar
Bottou, L.: Online algorithms and stochastic approximations. In: Saad, D. (ed.) Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, A.R., Stone, C.J.: Support-Vector Networks. Wadsworth and Brooks Cole Advanced Books and Software, Monterey (1984)
Google Scholar
Caruana, R., Caruana, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Chen, F.H., Howard, H.: An alternative model for the analysis of detecting electronic industries earnings management using stepwise regression, random forest, and decision tree. Soft Comput. 20(5), 1945–1960 (2015)
Article Google Scholar
Connor, J.J., Robertson, E.F.: Student’s t-test. MacTutor History of Mathematics Archive, University of St Andrews (1908)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Dasarathy, B., Los Alamitos: Nearest Neighbor (NN) Norms: Nn Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)
Article MATH Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1999)
MATH Google Scholar
Jiang, M.W., Li, H.L.: Vehicle classification based on hierarchical support vector machine. In: Proceedings of the International Conference on Computer Engineering and Network, pp. 593–600 (2014)
Kennedy, A., Shepherd, M.: Automatic identification of home pages on the web. In: Proceedings of the 38th Annual Hawaii International Conference on System Sciences, pp. 99–108 (2005)
Kumar, S., Sahoo, G.: Classification of heart disease using Naive Bayes and genetic algorithm. In: Proceedings of the International Conference on CIDM, pp. 269–282 (2014)
Li, D.G., Liu, X.B., Zhao, J.M.: An approach for J wave auto-detection based on support vector machine. In: Big Data Computing and Communications, pp. 435–461 (2015)
Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications—a decade review from 2000 to 2011. Expert Syst. Appl. 39, 11303–11311 (2012)
Article Google Scholar
Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)
Google Scholar
Littlestone, N.: Mistake bounds and logarithmic linear-threshold learning algorithms. Technical report UCSC-CRL-89-11 (1989)
Mohri, M., Rostamizadeh, A., Talwalker, A.: Foundations of Machine Learning. MIT, Cambridge (2012)
MATH Google Scholar
Nello, C., John, S.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Prakash, V.J., Nithya, L.M.: A survey on semi-supervised learning techniques. Int. J. Comput. Trends Technol. 8(1), 25–29 (2014)
Article Google Scholar
Provost, F.J., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. Proc. ICML 98, 445–453 (1998)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Rosenblatt, F.: The perceptron—a perceiving and recognizing automaton. Report 85-460-1 (1957)
von Neumann, J.: Various Techniques Used in Connection with Random Digits. Applied Mathematics Series, pp. 36–38. U.S. Government Printing Office, Washington (1951)
Wang, S.S., Jiang, L.X., Li, C.Q.: Adapting Naive Bayes tree for text classification. Knowl. Inf. Syst. 44(1), 77–89 (2015)
Article Google Scholar
Widrow, B., Hoff, M.E.: Adaptive switching circuits. In: Proceedings of WESCON Convention, pp. 96–140 (1960)
Yeung, D.S., Chan, P.P.K.: A novel dynamic fusion method using localized generalization error model. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 623–628 (2009)
Zhu, J., Yang, Y., Xie, Q., Wang, L., Hassan, S.: Robust hybrid name disambiguation framework for large databases. Scientometrics 98, 2255–2274 (2014)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Youth Teacher Startup Fund of South China Normal University (No. 14KJ18), the Natural Science Foundation of Guangdong Province, China (No. 2015A030310509), the National Natural Science Foundation of China(61370229,61272067), the National Key Technology R&D Program of China (No. 2014BAH28F02) and the S&T Projects of Guangdong Province (Nos. 2014B010103004, 2014B010117007, 2015A030401087, 2015B010110002, 2016B030305004, 2016A030303055 and 2016B010109008).

Author information

Authors and Affiliations

School of Computer Science, South China Normal University, Guangzhou, China
Jia Zhu, Chuanhua Xu, Xueqin Lin, Jin Huang & Changqin Huang
School of Computer Science and Technology, Soochow University, Soochow, China
Zhixu Li
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
Gabriel Fung

Authors

Jia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chuanhua Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhixu Li
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Fung
View author publications
You can also search for this author in PubMed Google Scholar
Xueqin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Changqin Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, J., Xu, C., Li, Z. et al. An examination of on-line machine learning approaches for pseudo-random generated data. Cluster Comput 19, 1309–1321 (2016). https://doi.org/10.1007/s10586-016-0586-5

Download citation

Received: 19 March 2016
Revised: 12 June 2016
Accepted: 20 June 2016
Published: 27 June 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10586-016-0586-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An examination of on-line machine learning approaches for pseudo-random generated data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Driven Generation of Synthetic Data with Support Vector Data Description

An Empirical Comparison of Support Vector Machines Versus Nearest Neighbour Methods for Machine Learning Applications

A Scalable Boosting Learner Using Adaptive Sampling

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An examination of on-line machine learning approaches for pseudo-random generated data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Driven Generation of Synthetic Data with Support Vector Data Description

An Empirical Comparison of Support Vector Machines Versus Nearest Neighbour Methods for Machine Learning Applications

A Scalable Boosting Learner Using Adaptive Sampling

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation