Abstract
The exponential growth of social networks has given rise to a wide variety of content. Some social content violates the integrity and dignity of users, therefore, this task has become challenging. The need to deal with short texts, poorly written language, unbalanced classes, and non-thematic aspects. These can lead to overfitting in deep neural network (DNN) models used for classification tasks. Empirical evidence in previous studies indicates that some of these problems can be overcome by improving the optimization process of the DNN weights to avoid overfitting. Moreover, a well-defined learning process in the input examples could improve the order of the patterns learned throughout the optimization process. In this paper, we propose four Curriculum Learning strategies and a new Hybrid Genetic–Gradient Algorithm that proved to improve the performance of DNN models detecting the class of interest even in highly imbalanced datasets.
Similar content being viewed by others
Data availability statements
The datasets used for the experiments presented in this document are available in the following repositories:
– Aggressiveness: It is available after registration on https://sites.google.com/view/mex-a3t/registration or by contacting the authors M. Montes-y-Gómez [mmontesg@inaoep.mx] and L. Villaseñor-Pineda [villasen@inaoep.mx]
– Grooming: It is available after registration on zenodo.org at
https://zenodo.org/record/3713280#.Yq383BxByis or on the official website
– Depression: It is available after filling an agreement form for the use of the dataset for research at https://erisk.irlab.org/eRisk2022
– Sentiment: It is available at https://www.cs.cornell.edu/people/pabo/movie-review-data/scale_data.tar.gz and the documentation is available at
Notes
This test used a bigger hidden layer. This was done to have more parameters and have more impact by different mutations and make the differences more evident. To make the test, the hidden layer of the models was increased to 300. It is worth mentioning that we do not show the results in the Depression dataset since we do not take these results into account to select the mutation operator because in this dataset the results are not good and are unstable.
References
Jonathan Culpeper (2011) Impoliteness: using language to cause offence, vol 28. Cambridge University Press, Cambridge
Losada David E, Crestani F, Parapar J (2018) Overview of erisk 2018: early risk prediction on the internet (extended lab overview). In: Proceedings of the 9th international conference of the CLEF association, CLEF, pp 1–20
Wolak J, Finkelhor D, Mitchell KJ, Ybarra ML (2010) Online “predators” and their victims: myths, realities, and implications for prevention and treatment
Tu Z, Liu Y, Shang L, Liu X, Li H (2017) Neural machine translation with reconstruction. In: Thirty-first AAAI conference on artificial intelligence
Dong L, Xu S, Xu B (2018) Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5884–5888
Jocher G, Stoken A, Borovec J (2020) NanoCode012, ChristopherSTAN, Liu Changyu, Laughing, tkianai, Adam Hogan, lorenzomammana, yxNONG, AlexWang1900, Laurentiu Diaconu, Marc, wanghaoyang0106, ml5ah, Doug, Francisco Ingham, Frederik, Guilhen, Hatovix, Jake Poznanski, Jiacong Fang, Lijun Yu, changyu98, Mingyu Wang, Naman Gupta, Osama Akhtar, PetrDvoracek, and Prashant Rai. ultralytics/yolov5: v3.1—bug fixes and performance improvements
Adrián López-Monroy, Alfredo Miranda, Daniel Aldana, Juan Carmona, Humberto Espinosa (2021) Deep learning for language and vision tasks in surveillance applications. Computación y Sistemas 25:05
Sepp Hochreiter (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl-Based Syst 6(02):107–116
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Larochelle H, Bengio Y, Louradour J, Lamblin P (2009) Exploring strategies for training deep neural networks. J Mach Learn Res 10(1):1–40. https://dl.acm.org/doi/10.5555/1577069.1577070
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, 41–48. https://doi.org/10.1145/1553374.1553380
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 4171–4186
Elman Jeffrey L (1993) Learning and development in neural networks: the importance of starting small. Cognition 48(1):71–99
Eliahu Wasserstrom (1973) Numerical solutions by the continuation method. SIAM Rev 15(1):89–119
Spitkovsky VI, Alshawi H, Jurafsky D (2010) From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Los Angeles, California, USA, 751–759. https://aclanthology.org/N10-1116
Klein D, Manning CD (2004) Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 478–485. https://aclanthology.org/P04-1061
Kumar M, Benjamin Packer, Daphne Koller (2010) Self-paced learning for latent variable models. Adv Neural Inf Process Syst 23:1189–1197
Soviany P, Ionescu RT, Rota P, Sebe N (2021) Curriculum learning: a survey. arXiv:2101.10382
Zhang D, Meng D, Li C, Jiang L, Zhao Q, Han J (2015) A self-paced multiple-instance learning framework for co-saliency detection. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Sepp Hochreiter, Jürgen Schmidhuber (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Cirik V, Hovy E, Morency L-P (2016) Visualizing and understanding curriculum learning for long short-term memory networks. arXiv:1611.06204
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22(3):400–407
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159. https://dl.acm.org/doi/10.5555/1953048.2021068
Hinton G, Srivastava N, Swersky K, Tieleman T, Mohamed A (2012) Coursera: neural networks for machine learning. Lecture 9c: using noise as a regularizer
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Aly A, Guadagni G, Dugan JB (2019) Derivative-free optimization of neural networks using local search. In: 2019 IEEE 10th annual ubiquitous computing, electronics mobile communication conference (UEMCON), pp 0293–0299
Davis L (1991) Handbook of genetic algorithms Chapman & Hall, London, Thomson Publishing Group; First Ed., January 1
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science. IEEE, pp 39–43
Soon Ong Yew, Keane Andy J (2004) Meta-lamarckian learning in memetic algorithms. IEEE Trans Evol Comput 8(2):99–110
Lee Kwang Y, Yang Frank F (1998) Optimal reactive power planning using evolutionary algorithms: a comparative study for evolutionary programming, evolutionary strategy, genetic algorithm, and linear programming. IEEE Trans Power Syst 13(1):101–108
Charles Audet, Dennis Jr John E (2002) Analysis of generalized pattern searches. SIAM J Optim 13(3):889–903
Abramson Mark A, Charles Audet, Chrissis James W, Walston Jennifer G (2009) Mesh adaptive direct search algorithms for mixed variable optimization. Optim Lett 3(1):35–47
Neveu N, Larson J, Power JG, Spentzouris L (2017) Photoinjector optimization using a derivative-free, model-based trust-region algorithm for the argonne wakefield accelerator. J Phys Conf Ser 874:012062
Nikolaus Hansen, Andreas Ostermeier (2001) Completely derandomized self-adaptation in evolution strategies. Evol Comput 9(2):159–195
Loshchilov I, Hutter F (2016) Cma-es for hyperparameter optimization of deep neural networks. arXiv:1604.07269
Heidrich-Meisner V, Igel C (2008) Evolution strategies for direct policy search. In: International conference on parallel problem solving from nature. Springer, pp 428–437
Fernandes Junior Francisco Erivaldo, Yen Gary G (2019) Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol Comput 49:62–74
Weiguo Sheng, Pengxiao Shan, Jiafa Mao, Yujun Zheng, Shengyong Chen, Zidong Wang (2017) An adaptive memetic algorithm with rank-based mutation for artificial neural network architecture optimization. IEEE Access 5:18895–18908
Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2017) Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567
Michel P, Hashimoto T, Neubig G (2021) Modeling the second player in distributionally robust optimization. arXiv:2103.10282
Dablain D, Krawczyk B, Chawla NV (2021) Deepsmote: fusing deep learning and smote for imbalanced data. arXiv:2105.02340
Carlos Segura, Arturo Hernández-Aguirre, Francisco Luna, Enrique Alba (2016) Improving diversity in evolutionary algorithms: new best solutions for frequency assignment. IEEE Trans Evol Comput 21(4):539–553
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075
Minlie Huang, Qiao Qian, Xiaoyan Zhu (2017) Encoding syntactic knowledge in neural networks for sentiment classification. ACM Trans Inf Syst (TOIS) 35(3):1–27
Zulqarnain M, Ghazali R, Ghouse MG, Mushtaq MF (2019) Efficient processing of gru based on word embedding for text classification. JOIV Int J Inform Visual 3(4):377–383
Lin Z, Feng M, dos Nogueira SC, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv:1703.03130
Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. In: Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543, Doha, Qatar. Association for Computational Linguistics
Gómez-Adorno H, Reyes-Magaña J, Bel-Enguix G, Sierra G (2019) Spanish word embeddings learned on word association norms. In: Proceedings of the 13th Alberto Mendelzon International Workshop on Foundations of Data Management, Asuncion, Paraguay, June 3-7, 2019, CEUR Workshop Proceedings, Vol. 2369, 2019, https://ceur-ws.org/Vol-2369/paper03.pdf
Aragón ME, Jarquín-Vásquez HJ, Montes-Y-Gómez M, Escalante HJ, Pineda LV, Gómez-Adorno H, Posadas-Durán JP, Bel-Enguix G (2020) Overview of mex-a3t at iberlef 2020: fake news and aggressiveness analysis in Mexican Spanish. In: IberLEF@ SEPLN, pp 222–235
Inches G, Crestani F (2012) Overview of the international sexual predator identification competition at pan-2012. In: CLEF (Online working notes/labs/workshop), volume 30
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. arXiv preprint cs/0506075
Zhang T, Huang M, Zhao L (2018) Learning structured representation for text classification via reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Guzman-Silverio M, Balderas-Paredes Á, López-Monroy AP (2020) Transformers and data augmentation for aggressiveness detection in Mexican Spanish. In: IberLEF@ SEPLN, pp 293–302
Sunipa Dev, Tao Li, Phillips Jeff M, Vivek Srikumar (2020) On measuring and mitigating biased inferences of word embeddings. Proc AAAI Conf Artif Intell 34:7659–7666
Anders CJ, Neumann D, Samek W, Müller K-R, Lapuschkin S (2021) Software for dataset-wide XAI: from local explanations to global insights with Zennit, CoRelAy, and ViRelAy. In: CoRR arXiv:abs/2106.13200
D’Angelo G, Palmieri F (2021) GGA: a modified genetic algorithm with gradient-based local search for solving constrained optimization problems. Inf Sci 547:136–162
Acknowledgements
The authors thank Consejo Nacional de Ciencia y Tecnología (CONACYT), Centro de Investigación en Matemáticas (CIMAT) and Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE) for the computer resources provided through the INAOE Supercomputing Laboratory’s Deep Learning Platform for Language Technologies (Laboratorio de Supercómputo: Plataforma de Aprendizaje Profundo) with the project “Identification of Aggressive and Offensive text through specialized BERT’s ensembles” and CIMAT Bajio Supercomputing Laboratory (#300832). Sanchez-Vega would like to thank CONACYT for its support through grant projects “Algoritmos de procesamiento del lenguaje natural para la modelación y análisis de la violencia textual con aplicación en documentos históricos” (ID. BP-FP-20201015143044227-814705), the Program “Investigadoras e Investigadores por México” by the project “Desarrollo de Inteligencia Artificial aplicada a la prevención de violencia y salud mental.” (ID. 11989, No. 1311) and “Ciencia de datos aplicado al análisis de expedientes de personas desaparecidas” (No. 617, Conv. 2020-01, ID. 314967).
Funding
Some important parts of this work were supported by Consejo Nacional de Ciencia y Tecnología, Centro de Investigación en Matemáticas and Instituto Nacional de Astrofísica, Óptica y Electrónica through project fund and Grants (declared in the Acknowledgments section).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
However, there is no conflict of interest for the financing and for the results obtained in this research work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Elías-Miranda, A.A., Vallejo-Aldana, D., Sánchez-Vega, F. et al. Curriculum learning and evolutionary optimization into deep learning for text classification. Neural Comput & Applic 35, 21129–21164 (2023). https://doi.org/10.1007/s00521-023-08632-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08632-8