Abstract
We discuss a domain-specific retrieval application for matching job seekers with open positions that uses a novel syntactic method of extracting skill-terms from the text of natural language job advertisements. Our new method is contrasted with two word embeddings methods, using word2vec. We define the notion of a skill headword, and present an algorithm that learns syntactic dependency patterns to recognize skill-terms. In all metrics, our syntactic method outperforms both word embeddings methods. Moreover, the word embeddings approaches were unable to model a meaningful distinction between skill-terms and non-skill-terms, while our syntactic approach was able to perform this successfully. We also show how these extracted skills can be used to automatically construct a semantic job-skills ontology, and facilitate a job-to-candidate matching system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
This work was funded by CTI/innosuisse under contract no. 27177.2 PFES-ES.
References
Amati, G., van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002). https://doi.org/10.1145/582415.582416. http://doi.acm.org/10.1145/582415.582416
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)
Bastian, M., et al.: Linkedin skills: large-scale topic extraction and inference, October 2014. https://doi.org/10.1145/2645710.2645729
Braschler, M.: The beauty of small data: an information retrieval perspective. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds.) Applied Data Science, pp. 233–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11821-1_13
Braun, S., Kunzmann, C., Schmidt, A.: People tagging and ontology maturing: toward collaborative competence management. In: Randall, D., Salembier, P. (eds.) From CSCW to Web 2.0: European Developments in Collaborative Design, pp. 133–154. Springer, Heidelberg (2010). https://doi.org/10.1007/978-1-84882-965-7_7
Celik, D.: Towards a semantic-based information extraction system for matching résumés to job openings. Turkish J. Electr. Eng. Comput. Sci. 24(1), 141–159 (2016)
Kivimäki, I., et al.: A graph-based approach to skill extraction from text. In: Proceedings of TextGraphs-8 Graph-Based Methods for Natural Language Processing, pp. 79–87 (2013)
Malherbe, E., Aufaure, M.A.: Bridge the terminology gap between recruiters and candidates: a multilingual skills base built from social media and linked data. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 583–590. IEEE (2016)
de Marneffe, M.C., Manning, C.D.: The Stanford typed dependencies representation. In: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8. CrossParser ’08, Association for Computational Linguistics, Stroudsburg, PA, USA (2008). http://dl.acm.org/citation.cfm?id=1608858.1608859
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Petras, V., Baerisch, S.: The domain-specific track at CLEF 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 186–198. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04447-2_23
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, May 2010
Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F., Jones, K.S.: Okapi at TREC-6 automatic ad hoc, VLC, routing, filtering and QSDR. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of The Sixth Text REtrieval Conference, TREC 1997, Gaithersburg, Maryland, USA, 19–21 November 1997. NIST Special Publication, vol. 500–240, pp. 125–136. National Institute of Standards and Technology (NIST) (1997). http://trec.nist.gov/pubs/trec6/papers/city_proc_auto.ps
Acknowledgements
We thank our partners at Skillue AG, Basel, Switzerland, for their contributions to this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Smith, E., Weiler, A., Braschler, M. (2021). Skill Extraction for Domain-Specific Text Retrieval in a Job-Matching Platform. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-85251-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)