Improved randomized learning algorithms for imbalanced and noisy educational data classification

Li, Ming; Huang, Changqin; Wang, Dianhui; Hu, Qintai; Zhu, Jia; Tang, Yong

doi:10.1007/s00607-018-00698-w

Improved randomized learning algorithms for imbalanced and noisy educational data classification

Published: 03 January 2019

Volume 101, pages 571–585, (2019)
Cite this article

Computing Aims and scope Submit manuscript

Ming Li¹,
Changqin Huang ORCID: orcid.org/0000-0003-1371-2608^1,2,
Dianhui Wang^3,4,
Qintai Hu^1,2,
Jia Zhu² &
…
Yong Tang²

693 Accesses
9 Citations
Explore all metrics

Abstract

Despite that neural networks have demonstrated their good potential to be used in constructing learners which exhibit strong predictive performance, there are still some uncertainty issues that can greatly affect the effectiveness of the employed supervised learning algorithms, such as class imbalance and labeling errors (or class noise). Technically, imbalanced data resource can cause more difficulties or limitations for learning algorithms to distinguish different classes, while data with labeling errors can lead to an unreasonable problem formulation due to incorrect hypotheses. Indeed, noise and class imbalance are pervasive problems in the domain of educational data analytics. This study aims at developing improved randomized learning algorithms by investigating a novel type of cost function that focuses on the combined effects of class imbalance and class noise. Instead of concerning these uncertainty issues isolation, we present a convex combination of robust and imbalanced modelling objectives, contributing to a generalized formulation of weighted least squares problems by which the improved randomized learner models can be built. Our experimental study on several educational data classification tasks have verified the advantages of our proposed algorithms, in comparison with some existing methods that either takes no account of class imbalance and labeling errors, or merely consider one specific aspect in problem-solving.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Article Open access 28 May 2016

Karl Weiss, Taghi M. Khoshgoftaar & DingDing Wang

Notes

References

Abellán J, Masegosa AR (2010) Bagging decision trees on data sets with classification noise. In: International symposium on foundations of information and knowledge systems, Springer, pp 248–265
Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167
Article MATH Google Scholar
Cortez P, Silva AMG (2008) Using data mining to predict secondary school student performance. In: Proceedings of the 5th future business technology conference, pp 5–12
Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869
Article MATH Google Scholar
Gorban AN, Tyukin IY, Prokhorov DV, Sofeikov KI (2016) Approximation with random bases: Pro et contra. Inf Sci 364:129–145
Article Google Scholar
He H, Garcia EA (2008) Learning from imbalanced data. IEEE Trans Knowl Data Eng 9:1263–1284
Google Scholar
Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329
Article Google Scholar
Khardon R, Wachman G (2007) Noise tolerant variants of the perceptron algorithm. J Mach Learn Res 8(Feb):227–248
MATH Google Scholar
Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21(5):813–830
Article Google Scholar
Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern A Syst Hum 41(3):552–568
Article Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Article Google Scholar
Lancaster P, Tismenetsky M (1985) The theory of matrices: with applications, 2nd edn. Academic Press, San Diego
MATH Google Scholar
Li M, Huang C, Wang D (2019) Robust stochastic configuration networks with maximum correntropy criterion for uncertain data regression. Inf Sci 473:73–86
Article Google Scholar
Li M, Wang D (2016) Insights into randomized algorithms for neural networks: Practical issues and common pitfalls. Inf Sci 382:170–178
Google Scholar
Li M, Wang D (2018) Two dimensional stochastic configuration networks for image data analytics. arXiv:1809.02066
Lin CF, Wang SD (2004) Training algorithms for fuzzy support vector machines with noisy data. Pattern Recognit Lett 25(14):1647–1656
Article Google Scholar
Masnadi-Shirazi H, Vasconcelos N (2009) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Advances in neural information processing systems, pp 1049–1056
Oza NC (2004) Aveboost2: boosting for noisy data. In: International workshop on multiple classifier systems, Springer, pp 31–40
Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180
Article Google Scholar
Pao YH, Takefuji Y (1992) Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5):76–79
Article Google Scholar
Scardapane S, Wang D (2017) Randomness in neural networks: an overview. WIREs Data Min Knowl Discov 7(2):e1200. https://doi.org/10.1002/widm.1200
Article Google Scholar
Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: International conference on artificial neural networks, Springer, pp 884–893
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Article MATH Google Scholar
Wang D, Cui C (2017) Stochastic configuration networks ensemble with heterogeneous features for large-scale data analytics. Inf Sci 417:55–71
Article Google Scholar
Wang D, Li M (2017) Stochastic configuration networks: fundamentals and algorithms. IEEE Trans Cybern q 47(10):3466–3479
Article Google Scholar
Wang D, Li M (2017) Robust stochastic configuration networks with kernel density estimation for uncertain data regression. Inf Sci 412:210–222
Article MathSciNet Google Scholar
Wang D, Li M (2018) Deep stochastic configuration networks with universal approximation property. In: Proceedings of international joint conference on neural networks, IEEE, pp 1–8

Download references

Author information

Authors and Affiliations

School of Information Technology in Education, South China Normal University, Guangzhou, China
Ming Li, Changqin Huang & Qintai Hu
Guangdong Engineering Research Center for Smart Learning, South China Normal University, Guangzhou, China
Changqin Huang, Qintai Hu, Jia Zhu & Yong Tang
Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
Dianhui Wang
The State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China
Dianhui Wang

Authors

Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Changqin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Dianhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qintai Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changqin Huang.

Additional information

This work was supported by the National Natural Science Foundation of China (No. 61802132 and 61877020), the China Postdoctoral Science Foundation Grant (No. 2018M630959), and the S&T Projects of Guangdong Province (No. 2016B010109008).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Huang, C., Wang, D. et al. Improved randomized learning algorithms for imbalanced and noisy educational data classification. Computing 101, 571–585 (2019). https://doi.org/10.1007/s00607-018-00698-w

Download citation

Received: 16 October 2018
Accepted: 27 December 2018
Published: 03 January 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s00607-018-00698-w

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved randomized learning algorithms for imbalanced and noisy educational data classification

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Improved randomized learning algorithms for imbalanced and noisy educational data classification

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation