Skill Extraction for Domain-Specific Text Retrieval in a Job-Matching Platform

Smith, Ellery; Weiler, Andreas; Braschler, Martin

doi:10.1007/978-3-030-85251-1_10

Skill Extraction for Domain-Specific Text Retrieval in a Job-Matching Platform

Ellery Smith¹⁸,
Andreas Weiler¹⁸ &
Martin Braschler¹⁸

Conference paper
First Online: 14 September 2021

1183 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Abstract

We discuss a domain-specific retrieval application for matching job seekers with open positions that uses a novel syntactic method of extracting skill-terms from the text of natural language job advertisements. Our new method is contrasted with two word embeddings methods, using word2vec. We define the notion of a skill headword, and present an algorithm that learns syntactic dependency patterns to recognize skill-terms. In all metrics, our syntactic method outperforms both word embeddings methods. Moreover, the word embeddings approaches were unable to model a meaningful distinction between skill-terms and non-skill-terms, while our syntactic approach was able to perform this successfully. We also show how these extracted skills can be used to automatically construct a semantic job-skills ontology, and facilitate a job-to-candidate matching system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This work was funded by CTI/innosuisse under contract no. 27177.2 PFES-ES.

References

Amati, G., van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002). https://doi.org/10.1145/582415.582416. http://doi.acm.org/10.1145/582415.582416
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)
Google Scholar
Bastian, M., et al.: Linkedin skills: large-scale topic extraction and inference, October 2014. https://doi.org/10.1145/2645710.2645729
Braschler, M.: The beauty of small data: an information retrieval perspective. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds.) Applied Data Science, pp. 233–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11821-1_13
Chapter Google Scholar
Braun, S., Kunzmann, C., Schmidt, A.: People tagging and ontology maturing: toward collaborative competence management. In: Randall, D., Salembier, P. (eds.) From CSCW to Web 2.0: European Developments in Collaborative Design, pp. 133–154. Springer, Heidelberg (2010). https://doi.org/10.1007/978-1-84882-965-7_7
Chapter Google Scholar
Celik, D.: Towards a semantic-based information extraction system for matching résumés to job openings. Turkish J. Electr. Eng. Comput. Sci. 24(1), 141–159 (2016)
Article Google Scholar
Kivimäki, I., et al.: A graph-based approach to skill extraction from text. In: Proceedings of TextGraphs-8 Graph-Based Methods for Natural Language Processing, pp. 79–87 (2013)
Google Scholar
Malherbe, E., Aufaure, M.A.: Bridge the terminology gap between recruiters and candidates: a multilingual skills base built from social media and linked data. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 583–590. IEEE (2016)
Google Scholar
de Marneffe, M.C., Manning, C.D.: The Stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8. CrossParser ’08, Association for Computational Linguistics, Stroudsburg, PA, USA (2008). http://dl.acm.org/citation.cfm?id=1608858.1608859
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Petras, V., Baerisch, S.: The domain-specific track at CLEF 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 186–198. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04447-2_23
Chapter Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, May 2010
Google Scholar
Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F., Jones, K.S.: Okapi at TREC-6 automatic ad hoc, VLC, routing, filtering and QSDR. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of The Sixth Text REtrieval Conference, TREC 1997, Gaithersburg, Maryland, USA, 19–21 November 1997. NIST Special Publication, vol. 500–240, pp. 125–136. National Institute of Standards and Technology (NIST) (1997). http://trec.nist.gov/pubs/trec6/papers/city_proc_auto.ps

Download references

Acknowledgements

We thank our partners at Skillue AG, Basel, Switzerland, for their contributions to this work.

Author information

Authors and Affiliations

Zürich University of Applied Sciences, Winterthur, Switzerland
Ellery Smith, Andreas Weiler & Martin Braschler

Authors

Ellery Smith
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Weiler
View author publications
You can also search for this author in PubMed Google Scholar
Martin Braschler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ellery Smith .

Editor information

Editors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan
Politehnica University of Bucharest, Bucharest, Romania
Bogdan Ionescu
Université Grenoble Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Aalborg University Copenhagen, Copenhagen, Denmark
Birger Larsen
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Montpellier, Montpellier, France
Alexis Joly
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
TU Wien, Vienna, Austria
Florina Piroi
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smith, E., Weiler, A., Braschler, M. (2021). Skill Extraction for Domain-Specific Text Retrieval in a Job-Matching Platform. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-85251-1_10
Published: 14 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics