Abstract
The growing demand for skilled labor in Arabophone regions has led to increased interest in automating the Curriculum Vitae (CV) classifying process. In this study, we present an innovative methodology that exploits the power of Transformers models and the flexibility of the Spark platform for the automatic classification of CVs written in Arabic. We start by preprocessing CVs using Arabic-specific text processing techniques, including tokenization and normalization. Then, we use a pre-trained Transformer model, adapted to the Arabic linguistic context, to extract relevant features from the CVs. These features are then fed into a Spark pipeline for classification. Thanks to the scaling of Spark, we were able to process large quantities of CVs in record time, making it a practical solution for recruitment companies. This research paves the way for more efficient and accurate automation of Arabic CV classifying, helping to facilitate the recruitment process in Arabic-speaking regions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Korde, V.: Text classification and classifiers a survey. Int. J. Artif. Intell. Appl. 3(2), 85–99 (2012). https://doi.org/10.5121/ijaia.2012.3208
FZ. El-Alami, S. O. El Alaoui, N. En Nahnahi , Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization, February 2021, Journal of King Saud University - Computer and Information Sciences 34(2), https://doi.org/10.1016/j.jksuci.2021.02.005
Cui, H., Wang, C., Yu, Y.: News short text classification based on BERT model and fusion model. Highlights Sci. Eng. Technol. 34, 262–2680 (2023). https://doi.org/10.54097/hset.v34i.5482
Mahmoudi, O., Filali Bouami, M., Badri, M.: Arabic language modeling based on supervised machine learning (2023). https://doi.org/10.18280/ria.360315
Shawal Chowdhury, S.M., Chowdhury, M., Sultana, A.: Matching job circular with resume using different natural language processing based algorithms. In: Machine Intelligence and Emerging Technologies, pp. 428–442 (2023). https://doi.org/10.1007/978-3-031-34619-4_34
Dong, Z.: Resume recommendation based on text similarity. Appl. Comput. Eng. 6(1), 848–853 (2023). https://doi.org/10.54254/2755-2721/6/20230937
Bouhoun, Z., Guerrois, T., Li, X., Baker, M.: Information retrieval using domain adapted language models: application to resume documents for HR recruitment assistance. In: Computational Science and Its Applications – ICCSA 2023 Workshops, pp. 440–457 (2023). https://doi.org/10.1007/978-3-031-37105-9_30
Rojas-Galeano, S., Posada, J., Ordoñez, E.: A bibliometric perspective on AI research for job-résumé matching. Sci. World J. 2022(3), 1–15 (2022). https://doi.org/10.1155/2022/8002363
Semberecki, P., Maciejewski, H.: Distributed classification of text documents on apache spark platform. In: International Conference on Artificial Intelligence and Soft Computing (2016). https://doi.org/10.1007/978-3-319-39378-0_53
Oğul, İ.Ü., Ozcan, C., Hakdağlı, Ö.: Text Classification with Spark Support Vector Machine, Conference: 1. Ulusal Bulut Bilişim Ve Büyük Veri Sempozyumu B3S’17, At: Antalya (2017)
The Deep Learning and Apache Spark Enabled Architecture for Improving the Performance of Big Data Classification. Int. J. Innovat. Technol. Exploring Eng. 8(11), 2908–2914 (2019), https://doi.org/10.35940/ijitee.K2445.0981119
Gonzalez-Lopez, J., Cano, A., Ventura, S.: Large-scale multi-label ensemble learning on spark (2017). https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.328,Conference:IEEETrustcom/BigDataSE/ICESS
Bourahouat, G., Abourezq, M., Najima, D.: Word embedding as a semantic feature extraction technique in Arabic natural language processing: an overview. Int. Arab J. Inf. Technol. 21(2) (2024). https://doi.org/10.34028/iajit/21/2/13
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chafi, S., Kabil, M., Kamouss, A. (2025). Transformers and Spark for Automated CV Classification in Arabophone Regions. In: Hdioud, B., Aouragh, S.L. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2024. Communications in Computer and Information Science, vol 2339. Springer, Cham. https://doi.org/10.1007/978-3-031-79164-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-79164-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79163-5
Online ISBN: 978-3-031-79164-2
eBook Packages: Computer ScienceComputer Science (R0)