Skip to main content

Advertisement

Log in

Curriculum learning and evolutionary optimization into deep learning for text classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The exponential growth of social networks has given rise to a wide variety of content. Some social content violates the integrity and dignity of users, therefore, this task has become challenging. The need to deal with short texts, poorly written language, unbalanced classes, and non-thematic aspects. These can lead to overfitting in deep neural network (DNN) models used for classification tasks. Empirical evidence in previous studies indicates that some of these problems can be overcome by improving the optimization process of the DNN weights to avoid overfitting. Moreover, a well-defined learning process in the input examples could improve the order of the patterns learned throughout the optimization process. In this paper, we propose four Curriculum Learning strategies and a new Hybrid Genetic–Gradient Algorithm that proved to improve the performance of DNN models detecting the class of interest even in highly imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability statements

The datasets used for the experiments presented in this document are available in the following repositories:

Aggressiveness: It is available after registration on https://sites.google.com/view/mex-a3t/registration or by contacting the authors M. Montes-y-Gómez [mmontesg@inaoep.mx] and L. Villaseñor-Pineda [villasen@inaoep.mx]

Grooming: It is available after registration on zenodo.org at

https://zenodo.org/record/3713280#.Yq383BxByis or on the official website

https://pan.webis.de/data

Depression: It is available after filling an agreement form for the use of the dataset for research at https://erisk.irlab.org/eRisk2022

Sentiment: It is available at https://www.cs.cornell.edu/people/pabo/movie-review-data/scale_data.tar.gz and the documentation is available at

https://www.cs.cornell.edu/people/pabo/movie-review-data/.

Notes

  1. This test used a bigger hidden layer. This was done to have more parameters and have more impact by different mutations and make the differences more evident. To make the test, the hidden layer of the models was increased to 300. It is worth mentioning that we do not show the results in the Depression dataset since we do not take these results into account to select the mutation operator because in this dataset the results are not good and are unstable.

References

  1. Jonathan Culpeper (2011) Impoliteness: using language to cause offence, vol 28. Cambridge University Press, Cambridge

    Google Scholar 

  2. Losada David E, Crestani F, Parapar J (2018) Overview of erisk 2018: early risk prediction on the internet (extended lab overview). In: Proceedings of the 9th international conference of the CLEF association, CLEF, pp 1–20

  3. Wolak J, Finkelhor D, Mitchell KJ, Ybarra ML (2010) Online “predators” and their victims: myths, realities, and implications for prevention and treatment

  4. Tu Z, Liu Y, Shang L, Liu X, Li H (2017) Neural machine translation with reconstruction. In: Thirty-first AAAI conference on artificial intelligence

  5. Dong L, Xu S, Xu B (2018) Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5884–5888

  6. Jocher G, Stoken A, Borovec J (2020) NanoCode012, ChristopherSTAN, Liu Changyu, Laughing, tkianai, Adam Hogan, lorenzomammana, yxNONG, AlexWang1900, Laurentiu Diaconu, Marc, wanghaoyang0106, ml5ah, Doug, Francisco Ingham, Frederik, Guilhen, Hatovix, Jake Poznanski, Jiacong Fang, Lijun Yu, changyu98, Mingyu Wang, Naman Gupta, Osama Akhtar, PetrDvoracek, and Prashant Rai. ultralytics/yolov5: v3.1—bug fixes and performance improvements

  7. Adrián López-Monroy, Alfredo Miranda, Daniel Aldana, Juan Carmona, Humberto Espinosa (2021) Deep learning for language and vision tasks in surveillance applications. Computación y Sistemas 25:05

    Google Scholar 

  8. Sepp Hochreiter (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116

    Article  MATH  Google Scholar 

  9. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  10. Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10(1):1–40. https://dl.acm.org/doi/10.5555/1577069.1577070

  11. Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 41–48. https://doi.org/10.1145/1553374.1553380

  12. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 4171–4186

  13. Elman Jeffrey L (1993) Learning and development in neural networks: the importance of starting small. Cognition 48(1):71–99

    Article  Google Scholar 

  14. Eliahu Wasserstrom (1973) Numerical solutions by the continuation method. SIAM Rev 15(1):89–119

    Article  MathSciNet  Google Scholar 

  15. Spitkovsky VI, Alshawi H, Jurafsky D (2010) From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Los Angeles, California, USA, 751–759. https://aclanthology.org/N10-1116

  16. Klein D, Manning CD (2004) Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 478–485. https://aclanthology.org/P04-1061

  17. Kumar M, Benjamin Packer, Daphne Koller (2010) Self-paced learning for latent variable models. Adv Neural Inf Process Syst 23:1189–1197

    Google Scholar 

  18. Soviany P, Ionescu RT, Rota P, Sebe N (2021) Curriculum learning: a survey. arXiv:2101.10382

  19. Zhang D, Meng D, Li C, Jiang L, Zhao Q, Han J (2015) A self-paced multiple-instance learning framework for co-saliency detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)

  20. Sepp Hochreiter, Jürgen Schmidhuber (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  21. Cirik V, Hovy E, Morency L-P (2016) Visualizing and understanding curriculum learning for long short-term memory networks. arXiv:1611.06204

  22. Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22(3):400–407

  23. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159. https://dl.acm.org/doi/10.5555/1953048.2021068

  24. Hinton G, Srivastava N, Swersky K, Tieleman T, Mohamed A (2012) Coursera: neural networks for machine learning. Lecture 9c: using noise as a regularizer

  25. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  26. Aly A, Guadagni G, Dugan JB (2019) Derivative-free optimization of neural networks using local search. In: 2019 IEEE 10th annual ubiquitous computing, electronics mobile communication conference (UEMCON), pp 0293–0299

  27. Davis L (1991) Handbook of genetic algorithms Chapman & Hall, London, Thomson Publishing Group; First Ed., January 1

  28. Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science. IEEE, pp 39–43

  29. Soon Ong Yew, Keane Andy J (2004) Meta-lamarckian learning in memetic algorithms. IEEE Trans Evol Comput 8(2):99–110

    Article  Google Scholar 

  30. Lee Kwang Y, Yang Frank F (1998) Optimal reactive power planning using evolutionary algorithms: a comparative study for evolutionary programming, evolutionary strategy, genetic algorithm, and linear programming. IEEE Trans Power Syst 13(1):101–108

    Article  Google Scholar 

  31. Charles Audet, Dennis Jr John E (2002) Analysis of generalized pattern searches. SIAM J Optim 13(3):889–903

    Article  MathSciNet  MATH  Google Scholar 

  32. Abramson Mark A, Charles Audet, Chrissis James W, Walston Jennifer G (2009) Mesh adaptive direct search algorithms for mixed variable optimization. Optim Lett 3(1):35–47

    Article  MathSciNet  MATH  Google Scholar 

  33. Neveu N, Larson J, Power JG, Spentzouris L (2017) Photoinjector optimization using a derivative-free, model-based trust-region algorithm for the argonne wakefield accelerator. J Phys Conf Ser 874:012062

    Article  Google Scholar 

  34. Nikolaus Hansen, Andreas Ostermeier (2001) Completely derandomized self-adaptation in evolution strategies. Evol Comput 9(2):159–195

    Article  Google Scholar 

  35. Loshchilov I, Hutter F (2016) Cma-es for hyperparameter optimization of deep neural networks. arXiv:1604.07269

  36. Heidrich-Meisner V, Igel C (2008) Evolution strategies for direct policy search. In: International conference on parallel problem solving from nature. Springer, pp 428–437

  37. Fernandes Junior Francisco Erivaldo, Yen Gary G (2019) Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol Comput 49:62–74

    Article  Google Scholar 

  38. Weiguo Sheng, Pengxiao Shan, Jiafa Mao, Yujun Zheng, Shengyong Chen, Zidong Wang (2017) An adaptive memetic algorithm with rank-based mutation for artificial neural network architecture optimization. IEEE Access 5:18895–18908

    Article  Google Scholar 

  39. Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2017) Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567

  40. Michel P, Hashimoto T, Neubig G (2021) Modeling the second player in distributionally robust optimization. arXiv:2103.10282

  41. Dablain D, Krawczyk B, Chawla NV (2021) Deepsmote: fusing deep learning and smote for imbalanced data. arXiv:2105.02340

  42. Carlos Segura, Arturo Hernández-Aguirre, Francisco Luna, Enrique Alba (2016) Improving diversity in evolutionary algorithms: new best solutions for frequency assignment. IEEE Trans Evol Comput 21(4):539–553

    Google Scholar 

  43. Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075

  44. Minlie Huang, Qiao Qian, Xiaoyan Zhu (2017) Encoding syntactic knowledge in neural networks for sentiment classification. ACM Trans Inf Syst (TOIS) 35(3):1–27

    Article  Google Scholar 

  45. Zulqarnain M, Ghazali R, Ghouse MG, Mushtaq MF (2019) Efficient processing of gru based on word embedding for text classification. JOIV Int J Inform Visual 3(4):377–383

    Google Scholar 

  46. Lin Z, Feng M, dos Nogueira SC, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:1703.03130

  47. Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543, Doha, Qatar. Association for Computational Linguistics

  48. Gómez-Adorno H, Reyes-Magaña J, Bel-Enguix G, Sierra G (2019) Spanish word embeddings learned on word association norms. In: Proceedings of the 13th Alberto Mendelzon International Workshop on Foundations of Data Management, Asuncion, Paraguay, June 3-7, 2019, CEUR Workshop Proceedings, Vol. 2369, 2019, https://ceur-ws.org/Vol-2369/paper03.pdf

  49. Aragón ME, Jarquín-Vásquez HJ, Montes-Y-Gómez M, Escalante HJ, Pineda LV, Gómez-Adorno H, Posadas-Durán JP, Bel-Enguix G (2020) Overview of mex-a3t at iberlef 2020: fake news and aggressiveness analysis in Mexican Spanish. In: IberLEF@ SEPLN, pp 222–235

  50. Inches G, Crestani F (2012) Overview of the international sexual predator identification competition at pan-2012. In: CLEF (Online working notes/labs/workshop), volume 30

  51. Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. arXiv preprint cs/0506075

  52. Zhang T, Huang M, Zhao L (2018) Learning structured representation for text classification via reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  53. Guzman-Silverio M, Balderas-Paredes Á, López-Monroy AP (2020) Transformers and data augmentation for aggressiveness detection in Mexican Spanish. In: IberLEF@ SEPLN, pp 293–302

  54. Sunipa Dev, Tao Li, Phillips Jeff M, Vivek Srikumar (2020) On measuring and mitigating biased inferences of word embeddings. Proc AAAI Conf Artif Intell 34:7659–7666

    Google Scholar 

  55. Anders CJ, Neumann D, Samek W, Müller K-R, Lapuschkin S (2021) Software for dataset-wide XAI: from local explanations to global insights with Zennit, CoRelAy, and ViRelAy. In: CoRR arXiv:abs/2106.13200

  56. D’Angelo G, Palmieri F (2021) GGA: a modified genetic algorithm with gradient-based local search for solving constrained optimization problems. Inf Sci 547:136–162

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank Consejo Nacional de Ciencia y Tecnología (CONACYT), Centro de Investigación en Matemáticas (CIMAT) and Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) for the computer resources provided through the INAOE Supercomputing Laboratory’s Deep Learning Platform for Language Technologies (Laboratorio de Supercómputo: Plataforma de Aprendizaje Profundo) with the project “Identification of Aggressive and Offensive text through specialized BERT’s ensembles” and CIMAT Bajio Supercomputing Laboratory (#300832). Sanchez-Vega would like to thank CONACYT for its support through grant projects “Algoritmos de procesamiento del lenguaje natural para la modelación y análisis de la violencia textual con aplicación en documentos históricos” (ID. BP-FP-20201015143044227-814705), the Program “Investigadoras e Investigadores por México” by the project “Desarrollo de Inteligencia Artificial aplicada a la prevención de violencia y salud mental.” (ID. 11989, No. 1311) and “Ciencia de datos aplicado al análisis de expedientes de personas desaparecidas” (No. 617, Conv. 2020-01, ID. 314967).

Funding

Some important parts of this work were supported by Consejo Nacional de Ciencia y Tecnología, Centro de Investigación en Matemáticas and Instituto Nacional de Astrofísica, Óptica y Electrónica through project fund and Grants (declared in the Acknowledgments section).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Pastor López-Monroy.

Ethics declarations

Conflict of interest

However, there is no conflict of interest for the financing and for the results obtained in this research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elías-Miranda, A.A., Vallejo-Aldana, D., Sánchez-Vega, F. et al. Curriculum learning and evolutionary optimization into deep learning for text classification. Neural Comput & Applic 35, 21129–21164 (2023). https://doi.org/10.1007/s00521-023-08632-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08632-8

Keywords

Navigation