Skip to main content

Skill Extraction for Domain-Specific Text Retrieval in a Job-Matching Platform

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Abstract

We discuss a domain-specific retrieval application for matching job seekers with open positions that uses a novel syntactic method of extracting skill-terms from the text of natural language job advertisements. Our new method is contrasted with two word embeddings methods, using word2vec. We define the notion of a skill headword, and present an algorithm that learns syntactic dependency patterns to recognize skill-terms. In all metrics, our syntactic method outperforms both word embeddings methods. Moreover, the word embeddings approaches were unable to model a meaningful distinction between skill-terms and non-skill-terms, while our syntactic approach was able to perform this successfully. We also show how these extracted skills can be used to automatically construct a semantic job-skills ontology, and facilitate a job-to-candidate matching system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This work was funded by CTI/innosuisse under contract no. 27177.2 PFES-ES.

References

  1. Amati, G., van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002). https://doi.org/10.1145/582415.582416. http://doi.acm.org/10.1145/582415.582416

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)

    Google Scholar 

  4. Bastian, M., et al.: Linkedin skills: large-scale topic extraction and inference, October 2014. https://doi.org/10.1145/2645710.2645729

  5. Braschler, M.: The beauty of small data: an information retrieval perspective. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds.) Applied Data Science, pp. 233–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11821-1_13

    Chapter  Google Scholar 

  6. Braun, S., Kunzmann, C., Schmidt, A.: People tagging and ontology maturing: toward collaborative competence management. In: Randall, D., Salembier, P. (eds.) From CSCW to Web 2.0: European Developments in Collaborative Design, pp. 133–154. Springer, Heidelberg (2010). https://doi.org/10.1007/978-1-84882-965-7_7

    Chapter  Google Scholar 

  7. Celik, D.: Towards a semantic-based information extraction system for matching résumés to job openings. Turkish J. Electr. Eng. Comput. Sci. 24(1), 141–159 (2016)

    Article  Google Scholar 

  8. Kivimäki, I., et al.: A graph-based approach to skill extraction from text. In: Proceedings of TextGraphs-8 Graph-Based Methods for Natural Language Processing, pp. 79–87 (2013)

    Google Scholar 

  9. Malherbe, E., Aufaure, M.A.: Bridge the terminology gap between recruiters and candidates: a multilingual skills base built from social media and linked data. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 583–590. IEEE (2016)

    Google Scholar 

  10. de Marneffe, M.C., Manning, C.D.: The Stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8. CrossParser ’08, Association for Computational Linguistics, Stroudsburg, PA, USA (2008). http://dl.acm.org/citation.cfm?id=1608858.1608859

  11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  12. Petras, V., Baerisch, S.: The domain-specific track at CLEF 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 186–198. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04447-2_23

    Chapter  Google Scholar 

  13. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, May 2010

    Google Scholar 

  14. Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F., Jones, K.S.: Okapi at TREC-6 automatic ad hoc, VLC, routing, filtering and QSDR. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of The Sixth Text REtrieval Conference, TREC 1997, Gaithersburg, Maryland, USA, 19–21 November 1997. NIST Special Publication, vol. 500–240, pp. 125–136. National Institute of Standards and Technology (NIST) (1997). http://trec.nist.gov/pubs/trec6/papers/city_proc_auto.ps

Download references

Acknowledgements

We thank our partners at Skillue AG, Basel, Switzerland, for their contributions to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ellery Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Smith, E., Weiler, A., Braschler, M. (2021). Skill Extraction for Domain-Specific Text Retrieval in a Job-Matching Platform. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85251-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85250-4

  • Online ISBN: 978-3-030-85251-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics