Transformers and Spark for Automated CV Classification in Arabophone Regions

Chafi, Soumia; Kabil, Mustapha; Kamouss, Abdessamad

doi:10.1007/978-3-031-79164-2_14

Soumia Chafi⁶,
Mustapha Kabil⁶ &
Abdessamad Kamouss⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2339))

Included in the following conference series:

International Conference on Arabic Language Processing

5 Accesses

Abstract

The growing demand for skilled labor in Arabophone regions has led to increased interest in automating the Curriculum Vitae (CV) classifying process. In this study, we present an innovative methodology that exploits the power of Transformers models and the flexibility of the Spark platform for the automatic classification of CVs written in Arabic. We start by preprocessing CVs using Arabic-specific text processing techniques, including tokenization and normalization. Then, we use a pre-trained Transformer model, adapted to the Arabic linguistic context, to extract relevant features from the CVs. These features are then fed into a Spark pipeline for classification. Thanks to the scaling of Spark, we were able to process large quantities of CVs in record time, making it a practical solution for recruitment companies. This research paves the way for more efficient and accurate automation of Arabic CV classifying, helping to facilitate the recruitment process in Arabic-speaking regions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Korde, V.: Text classification and classifiers a survey. Int. J. Artif. Intell. Appl. 3(2), 85–99 (2012). https://doi.org/10.5121/ijaia.2012.3208
Article MATH Google Scholar
FZ. El-Alami, S. O. El Alaoui, N. En Nahnahi , Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization, February 2021, Journal of King Saud University - Computer and Information Sciences 34(2), https://doi.org/10.1016/j.jksuci.2021.02.005
Cui, H., Wang, C., Yu, Y.: News short text classification based on BERT model and fusion model. Highlights Sci. Eng. Technol. 34, 262–2680 (2023). https://doi.org/10.54097/hset.v34i.5482
Article MATH Google Scholar
Mahmoudi, O., Filali Bouami, M., Badri, M.: Arabic language modeling based on supervised machine learning (2023). https://doi.org/10.18280/ria.360315
Shawal Chowdhury, S.M., Chowdhury, M., Sultana, A.: Matching job circular with resume using different natural language processing based algorithms. In: Machine Intelligence and Emerging Technologies, pp. 428–442 (2023). https://doi.org/10.1007/978-3-031-34619-4_34
Dong, Z.: Resume recommendation based on text similarity. Appl. Comput. Eng. 6(1), 848–853 (2023). https://doi.org/10.54254/2755-2721/6/20230937
Article MATH Google Scholar
Bouhoun, Z., Guerrois, T., Li, X., Baker, M.: Information retrieval using domain adapted language models: application to resume documents for HR recruitment assistance. In: Computational Science and Its Applications – ICCSA 2023 Workshops, pp. 440–457 (2023). https://doi.org/10.1007/978-3-031-37105-9_30
Rojas-Galeano, S., Posada, J., Ordoñez, E.: A bibliometric perspective on AI research for job-résumé matching. Sci. World J. 2022(3), 1–15 (2022). https://doi.org/10.1155/2022/8002363
Semberecki, P., Maciejewski, H.: Distributed classification of text documents on apache spark platform. In: International Conference on Artificial Intelligence and Soft Computing (2016). https://doi.org/10.1007/978-3-319-39378-0_53
Oğul, İ.Ü., Ozcan, C., Hakdağlı, Ö.: Text Classification with Spark Support Vector Machine, Conference: 1. Ulusal Bulut Bilişim Ve Büyük Veri Sempozyumu B3S’17, At: Antalya (2017)
Google Scholar
The Deep Learning and Apache Spark Enabled Architecture for Improving the Performance of Big Data Classification. Int. J. Innovat. Technol. Exploring Eng. 8(11), 2908–2914 (2019), https://doi.org/10.35940/ijitee.K2445.0981119
Gonzalez-Lopez, J., Cano, A., Ventura, S.: Large-scale multi-label ensemble learning on spark (2017). https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.328,Conference:IEEETrustcom/BigDataSE/ICESS
Bourahouat, G., Abourezq, M., Najima, D.: Word embedding as a semantic feature extraction technique in Arabic natural language processing: an overview. Int. Arab J. Inf. Technol. 21(2) (2024). https://doi.org/10.34028/iajit/21/2/13

Download references

Author information

Authors and Affiliations

FSTM, Hassan II University Mohamedia, Mohammedia, Morocco
Soumia Chafi, Mustapha Kabil & Abdessamad Kamouss

Authors

Soumia Chafi
View author publications
You can also search for this author in PubMed Google Scholar
Mustapha Kabil
View author publications
You can also search for this author in PubMed Google Scholar
Abdessamad Kamouss
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumia Chafi .

Editor information

Editors and Affiliations

ENSIAS, Mohammed V University, Rabat, Morocco
Boutaina Hdioud
ENSIAS, Mohammed V University, Rabat, Morocco
Si Lhoussain Aouragh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chafi, S., Kabil, M., Kamouss, A. (2025). Transformers and Spark for Automated CV Classification in Arabophone Regions. In: Hdioud, B., Aouragh, S.L. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2024. Communications in Computer and Information Science, vol 2339. Springer, Cham. https://doi.org/10.1007/978-3-031-79164-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-79164-2_14
Published: 02 February 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-79163-5
Online ISBN: 978-3-031-79164-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics