Curriculum learning and evolutionary optimization into deep learning for text classification

Elías-Miranda, Alfredo Arturo; Vallejo-Aldana, Daniel; Sánchez-Vega, Fernando; López-Monroy, A. Pastor; Rosales-Pérez, Alejandro; Muñiz-Sanchez, Victor

doi:10.1007/s00521-023-08632-8

Curriculum learning and evolutionary optimization into deep learning for text classification

Original Article
Published: 03 August 2023

Volume 35, pages 21129–21164, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Alfredo Arturo Elías-Miranda¹^na1,
Daniel Vallejo-Aldana¹^na1,
Fernando Sánchez-Vega^1,2^na1,
A. Pastor López-Monroy ORCID: orcid.org/0000-0003-1018-4221¹,
Alejandro Rosales-Pérez³ &
…
Victor Muñiz-Sanchez³

375 Accesses
Explore all metrics

Abstract

The exponential growth of social networks has given rise to a wide variety of content. Some social content violates the integrity and dignity of users, therefore, this task has become challenging. The need to deal with short texts, poorly written language, unbalanced classes, and non-thematic aspects. These can lead to overfitting in deep neural network (DNN) models used for classification tasks. Empirical evidence in previous studies indicates that some of these problems can be overcome by improving the optimization process of the DNN weights to avoid overfitting. Moreover, a well-defined learning process in the input examples could improve the order of the patterns learned throughout the optimization process. In this paper, we propose four Curriculum Learning strategies and a new Hybrid Genetic–Gradient Algorithm that proved to improve the performance of DNN models detecting the class of interest even in highly imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Corpus Analysis Using Relaxed Conjugate Gradient Neural Network Training Algorithm

Article 27 October 2018

Curriculum Learning in Sentiment Analysis

Text Classification Using Lifelong Machine Learning

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability statements

The datasets used for the experiments presented in this document are available in the following repositories:

– Aggressiveness: It is available after registration on https://sites.google.com/view/mex-a3t/registration or by contacting the authors M. Montes-y-Gómez [mmontesg@inaoep.mx] and L. Villaseñor-Pineda [villasen@inaoep.mx]

– Grooming: It is available after registration on zenodo.org at

https://zenodo.org/record/3713280#.Yq383BxByis or on the official website

https://pan.webis.de/data

– Depression: It is available after filling an agreement form for the use of the dataset for research at https://erisk.irlab.org/eRisk2022

– Sentiment: It is available at https://www.cs.cornell.edu/people/pabo/movie-review-data/scale_data.tar.gz and the documentation is available at

https://www.cs.cornell.edu/people/pabo/movie-review-data/.

Notes

This test used a bigger hidden layer. This was done to have more parameters and have more impact by different mutations and make the differences more evident. To make the test, the hidden layer of the models was increased to 300. It is worth mentioning that we do not show the results in the Depression dataset since we do not take these results into account to select the mutation operator because in this dataset the results are not good and are unstable.

References

Jonathan Culpeper (2011) Impoliteness: using language to cause offence, vol 28. Cambridge University Press, Cambridge
Google Scholar
Losada David E, Crestani F, Parapar J (2018) Overview of erisk 2018: early risk prediction on the internet (extended lab overview). In: Proceedings of the 9th international conference of the CLEF association, CLEF, pp 1–20
Wolak J, Finkelhor D, Mitchell KJ, Ybarra ML (2010) Online “predators” and their victims: myths, realities, and implications for prevention and treatment
Tu Z, Liu Y, Shang L, Liu X, Li H (2017) Neural machine translation with reconstruction. In: Thirty-first AAAI conference on artificial intelligence
Dong L, Xu S, Xu B (2018) Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5884–5888
Jocher G, Stoken A, Borovec J (2020) NanoCode012, ChristopherSTAN, Liu Changyu, Laughing, tkianai, Adam Hogan, lorenzomammana, yxNONG, AlexWang1900, Laurentiu Diaconu, Marc, wanghaoyang0106, ml5ah, Doug, Francisco Ingham, Frederik, Guilhen, Hatovix, Jake Poznanski, Jiacong Fang, Lijun Yu, changyu98, Mingyu Wang, Naman Gupta, Osama Akhtar, PetrDvoracek, and Prashant Rai. ultralytics/yolov5: v3.1—bug fixes and performance improvements
Adrián López-Monroy, Alfredo Miranda, Daniel Aldana, Juan Carmona, Humberto Espinosa (2021) Deep learning for language and vision tasks in surveillance applications. Computación y Sistemas 25:05
Google Scholar
Sepp Hochreiter (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116
Article MATH Google Scholar
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10(1):1–40. https://dl.acm.org/doi/10.5555/1577069.1577070
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 41–48. https://doi.org/10.1145/1553374.1553380
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 4171–4186
Elman Jeffrey L (1993) Learning and development in neural networks: the importance of starting small. Cognition 48(1):71–99
Article Google Scholar
Eliahu Wasserstrom (1973) Numerical solutions by the continuation method. SIAM Rev 15(1):89–119
Article MathSciNet Google Scholar
Spitkovsky VI, Alshawi H, Jurafsky D (2010) From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Los Angeles, California, USA, 751–759. https://aclanthology.org/N10-1116
Klein D, Manning CD (2004) Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 478–485. https://aclanthology.org/P04-1061
Kumar M, Benjamin Packer, Daphne Koller (2010) Self-paced learning for latent variable models. Adv Neural Inf Process Syst 23:1189–1197
Google Scholar
Soviany P, Ionescu RT, Rota P, Sebe N (2021) Curriculum learning: a survey. arXiv:2101.10382
Zhang D, Meng D, Li C, Jiang L, Zhao Q, Han J (2015) A self-paced multiple-instance learning framework for co-saliency detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Sepp Hochreiter, Jürgen Schmidhuber (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Cirik V, Hovy E, Morency L-P (2016) Visualizing and understanding curriculum learning for long short-term memory networks. arXiv:1611.06204
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22(3):400–407
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159. https://dl.acm.org/doi/10.5555/1953048.2021068
Hinton G, Srivastava N, Swersky K, Tieleman T, Mohamed A (2012) Coursera: neural networks for machine learning. Lecture 9c: using noise as a regularizer
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Aly A, Guadagni G, Dugan JB (2019) Derivative-free optimization of neural networks using local search. In: 2019 IEEE 10th annual ubiquitous computing, electronics mobile communication conference (UEMCON), pp 0293–0299
Davis L (1991) Handbook of genetic algorithms Chapman & Hall, London, Thomson Publishing Group; First Ed., January 1
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science. IEEE, pp 39–43
Soon Ong Yew, Keane Andy J (2004) Meta-lamarckian learning in memetic algorithms. IEEE Trans Evol Comput 8(2):99–110
Article Google Scholar
Lee Kwang Y, Yang Frank F (1998) Optimal reactive power planning using evolutionary algorithms: a comparative study for evolutionary programming, evolutionary strategy, genetic algorithm, and linear programming. IEEE Trans Power Syst 13(1):101–108
Article Google Scholar
Charles Audet, Dennis Jr John E (2002) Analysis of generalized pattern searches. SIAM J Optim 13(3):889–903
Article MathSciNet MATH Google Scholar
Abramson Mark A, Charles Audet, Chrissis James W, Walston Jennifer G (2009) Mesh adaptive direct search algorithms for mixed variable optimization. Optim Lett 3(1):35–47
Article MathSciNet MATH Google Scholar
Neveu N, Larson J, Power JG, Spentzouris L (2017) Photoinjector optimization using a derivative-free, model-based trust-region algorithm for the argonne wakefield accelerator. J Phys Conf Ser 874:012062
Article Google Scholar
Nikolaus Hansen, Andreas Ostermeier (2001) Completely derandomized self-adaptation in evolution strategies. Evol Comput 9(2):159–195
Article Google Scholar
Loshchilov I, Hutter F (2016) Cma-es for hyperparameter optimization of deep neural networks. arXiv:1604.07269
Heidrich-Meisner V, Igel C (2008) Evolution strategies for direct policy search. In: International conference on parallel problem solving from nature. Springer, pp 428–437
Fernandes Junior Francisco Erivaldo, Yen Gary G (2019) Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol Comput 49:62–74
Article Google Scholar
Weiguo Sheng, Pengxiao Shan, Jiafa Mao, Yujun Zheng, Shengyong Chen, Zidong Wang (2017) An adaptive memetic algorithm with rank-based mutation for artificial neural network architecture optimization. IEEE Access 5:18895–18908
Article Google Scholar
Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2017) Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567
Michel P, Hashimoto T, Neubig G (2021) Modeling the second player in distributionally robust optimization. arXiv:2103.10282
Dablain D, Krawczyk B, Chawla NV (2021) Deepsmote: fusing deep learning and smote for imbalanced data. arXiv:2105.02340
Carlos Segura, Arturo Hernández-Aguirre, Francisco Luna, Enrique Alba (2016) Improving diversity in evolutionary algorithms: new best solutions for frequency assignment. IEEE Trans Evol Comput 21(4):539–553
Google Scholar
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075
Minlie Huang, Qiao Qian, Xiaoyan Zhu (2017) Encoding syntactic knowledge in neural networks for sentiment classification. ACM Trans Inf Syst (TOIS) 35(3):1–27
Article Google Scholar
Zulqarnain M, Ghazali R, Ghouse MG, Mushtaq MF (2019) Efficient processing of gru based on word embedding for text classification. JOIV Int J Inform Visual 3(4):377–383
Google Scholar
Lin Z, Feng M, dos Nogueira SC, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:1703.03130
Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543, Doha, Qatar. Association for Computational Linguistics
Gómez-Adorno H, Reyes-Magaña J, Bel-Enguix G, Sierra G (2019) Spanish word embeddings learned on word association norms. In: Proceedings of the 13th Alberto Mendelzon International Workshop on Foundations of Data Management, Asuncion, Paraguay, June 3-7, 2019, CEUR Workshop Proceedings, Vol. 2369, 2019, https://ceur-ws.org/Vol-2369/paper03.pdf
Aragón ME, Jarquín-Vásquez HJ, Montes-Y-Gómez M, Escalante HJ, Pineda LV, Gómez-Adorno H, Posadas-Durán JP, Bel-Enguix G (2020) Overview of mex-a3t at iberlef 2020: fake news and aggressiveness analysis in Mexican Spanish. In: IberLEF@ SEPLN, pp 222–235
Inches G, Crestani F (2012) Overview of the international sexual predator identification competition at pan-2012. In: CLEF (Online working notes/labs/workshop), volume 30
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. arXiv preprint cs/0506075
Zhang T, Huang M, Zhao L (2018) Learning structured representation for text classification via reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Guzman-Silverio M, Balderas-Paredes Á, López-Monroy AP (2020) Transformers and data augmentation for aggressiveness detection in Mexican Spanish. In: IberLEF@ SEPLN, pp 293–302
Sunipa Dev, Tao Li, Phillips Jeff M, Vivek Srikumar (2020) On measuring and mitigating biased inferences of word embeddings. Proc AAAI Conf Artif Intell 34:7659–7666
Google Scholar
Anders CJ, Neumann D, Samek W, Müller K-R, Lapuschkin S (2021) Software for dataset-wide XAI: from local explanations to global insights with Zennit, CoRelAy, and ViRelAy. In: CoRR arXiv:abs/2106.13200
D’Angelo G, Palmieri F (2021) GGA: a modified genetic algorithm with gradient-based local search for solving constrained optimization problems. Inf Sci 547:136–162
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank Consejo Nacional de Ciencia y Tecnología (CONACYT), Centro de Investigación en Matemáticas (CIMAT) and Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) for the computer resources provided through the INAOE Supercomputing Laboratory’s Deep Learning Platform for Language Technologies (Laboratorio de Supercómputo: Plataforma de Aprendizaje Profundo) with the project “Identification of Aggressive and Offensive text through specialized BERT’s ensembles” and CIMAT Bajio Supercomputing Laboratory (#300832). Sanchez-Vega would like to thank CONACYT for its support through grant projects “Algoritmos de procesamiento del lenguaje natural para la modelación y análisis de la violencia textual con aplicación en documentos históricos” (ID. BP-FP-20201015143044227-814705), the Program “Investigadoras e Investigadores por México” by the project “Desarrollo de Inteligencia Artificial aplicada a la prevención de violencia y salud mental.” (ID. 11989, No. 1311) and “Ciencia de datos aplicado al análisis de expedientes de personas desaparecidas” (No. 617, Conv. 2020-01, ID. 314967).

Funding

Some important parts of this work were supported by Consejo Nacional de Ciencia y Tecnología, Centro de Investigación en Matemáticas and Instituto Nacional de Astrofísica, Óptica y Electrónica through project fund and Grants (declared in the Acknowledgments section).

Author information

Alfredo Arturo Elías-Miranda, Daniel Vallejo-Aldana, and Fernando Sánchez-Vega have contributed equally to this work.

Authors and Affiliations

Mathematics Research Center (CIMAT), Jalisco s/n Valenciana, 36023, Guanajuato, Mexico
Alfredo Arturo Elías-Miranda, Daniel Vallejo-Aldana, Fernando Sánchez-Vega & A. Pastor López-Monroy
Consejo Nacional de Ciencia y Tecnología (CONACYT), Av. Insurgentes Sur 1582, Col. Crédito Constructor, 03940, Ciudad de México, Mexico
Fernando Sánchez-Vega
Mathematics Research Center (CIMAT), Monterrey, Av. Alianza Centro 502, 66628, Apodaca, Nuevo León, Mexico
Alejandro Rosales-Pérez & Victor Muñiz-Sanchez

Authors

Alfredo Arturo Elías-Miranda
View author publications
You can also search for this author inPubMed Google Scholar
Daniel Vallejo-Aldana
View author publications
You can also search for this author inPubMed Google Scholar
Fernando Sánchez-Vega
View author publications
You can also search for this author inPubMed Google Scholar
A. Pastor López-Monroy
View author publications
You can also search for this author inPubMed Google Scholar
Alejandro Rosales-Pérez
View author publications
You can also search for this author inPubMed Google Scholar
Victor Muñiz-Sanchez
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to A. Pastor López-Monroy.

Ethics declarations

Conflict of interest

However, there is no conflict of interest for the financing and for the results obtained in this research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Elías-Miranda, A.A., Vallejo-Aldana, D., Sánchez-Vega, F. et al. Curriculum learning and evolutionary optimization into deep learning for text classification. Neural Comput & Applic 35, 21129–21164 (2023). https://doi.org/10.1007/s00521-023-08632-8

Download citation

Received: 18 June 2022
Accepted: 02 May 2023
Published: 03 August 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00521-023-08632-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Curriculum learning and evolutionary optimization into deep learning for text classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Corpus Analysis Using Relaxed Conjugate Gradient Neural Network Training Algorithm

Curriculum Learning in Sentiment Analysis

Text Classification Using Lifelong Machine Learning

Explore related subjects

Data availability statements

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now