Skip to main content

Transformers and Spark for Automated CV Classification in Arabophone Regions

  • Conference paper
  • First Online:
Arabic Language Processing: From Theory to Practice (ICALP 2024)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2339))

Included in the following conference series:

  • 5 Accesses

Abstract

The growing demand for skilled labor in Arabophone regions has led to increased interest in automating the Curriculum Vitae (CV) classifying process. In this study, we present an innovative methodology that exploits the power of Transformers models and the flexibility of the Spark platform for the automatic classification of CVs written in Arabic. We start by preprocessing CVs using Arabic-specific text processing techniques, including tokenization and normalization. Then, we use a pre-trained Transformer model, adapted to the Arabic linguistic context, to extract relevant features from the CVs. These features are then fed into a Spark pipeline for classification. Thanks to the scaling of Spark, we were able to process large quantities of CVs in record time, making it a practical solution for recruitment companies. This research paves the way for more efficient and accurate automation of Arabic CV classifying, helping to facilitate the recruitment process in Arabic-speaking regions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Korde, V.: Text classification and classifiers a survey. Int. J. Artif. Intell. Appl. 3(2), 85–99 (2012). https://doi.org/10.5121/ijaia.2012.3208

    Article  MATH  Google Scholar 

  2. FZ. El-Alami, S. O. El Alaoui, N. En Nahnahi , Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization, February 2021, Journal of King Saud University - Computer and Information Sciences 34(2), https://doi.org/10.1016/j.jksuci.2021.02.005

  3. Cui, H., Wang, C., Yu, Y.: News short text classification based on BERT model and fusion model. Highlights Sci. Eng. Technol. 34, 262–2680 (2023). https://doi.org/10.54097/hset.v34i.5482

    Article  MATH  Google Scholar 

  4. Mahmoudi, O., Filali Bouami, M., Badri, M.: Arabic language modeling based on supervised machine learning (2023). https://doi.org/10.18280/ria.360315

  5. Shawal Chowdhury, S.M., Chowdhury, M., Sultana, A.: Matching job circular with resume using different natural language processing based algorithms. In: Machine Intelligence and Emerging Technologies, pp. 428–442 (2023). https://doi.org/10.1007/978-3-031-34619-4_34

  6. Dong, Z.: Resume recommendation based on text similarity. Appl. Comput. Eng. 6(1), 848–853 (2023). https://doi.org/10.54254/2755-2721/6/20230937

    Article  MATH  Google Scholar 

  7. Bouhoun, Z., Guerrois, T., Li, X., Baker, M.: Information retrieval using domain adapted language models: application to resume documents for HR recruitment assistance. In: Computational Science and Its Applications – ICCSA 2023 Workshops, pp. 440–457 (2023). https://doi.org/10.1007/978-3-031-37105-9_30

  8. Rojas-Galeano, S., Posada, J., Ordoñez, E.: A bibliometric perspective on AI research for job-résumé matching. Sci. World J. 2022(3), 1–15 (2022). https://doi.org/10.1155/2022/8002363

  9. Semberecki, P., Maciejewski, H.: Distributed classification of text documents on apache spark platform. In: International Conference on Artificial Intelligence and Soft Computing (2016). https://doi.org/10.1007/978-3-319-39378-0_53

  10. Oğul, İ.Ü., Ozcan, C., Hakdağlı, Ö.: Text Classification with Spark Support Vector Machine, Conference: 1. Ulusal Bulut Bilişim Ve Büyük Veri Sempozyumu B3S’17, At: Antalya (2017)

    Google Scholar 

  11. The Deep Learning and Apache Spark Enabled Architecture for Improving the Performance of Big Data Classification. Int. J. Innovat. Technol. Exploring Eng. 8(11), 2908–2914 (2019), https://doi.org/10.35940/ijitee.K2445.0981119

  12. Gonzalez-Lopez, J., Cano, A., Ventura, S.: Large-scale multi-label ensemble learning on spark (2017). https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.328,Conference:IEEETrustcom/BigDataSE/ICESS

  13. Bourahouat, G., Abourezq, M., Najima, D.: Word embedding as a semantic feature extraction technique in Arabic natural language processing: an overview. Int. Arab J. Inf. Technol. 21(2) (2024). https://doi.org/10.34028/iajit/21/2/13

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumia Chafi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chafi, S., Kabil, M., Kamouss, A. (2025). Transformers and Spark for Automated CV Classification in Arabophone Regions. In: Hdioud, B., Aouragh, S.L. (eds) Arabic Language Processing: From Theory to Practice. ICALP 2024. Communications in Computer and Information Science, vol 2339. Springer, Cham. https://doi.org/10.1007/978-3-031-79164-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-79164-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-79163-5

  • Online ISBN: 978-3-031-79164-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics