skip to main content
10.1145/3632410.3632454acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
short-paper

Expertise Identification Using Transformers

Published: 04 January 2024 Publication History

Abstract

Expertise Identification involves extracting expertise/skills of a person from a set of documents related to his work. This has many important applications in large multi-disciplinary organizations such as ours. Most of the existing approaches for Expertise Identification apply unsupervised learning techniques such as those based on TF-IDF to extract keyphrases from the documents, which are then used as expertise. However, keyphrases represent the main ideas covered in a document, whereas expertise should be more domain-specific and detailed to be practically usable. Moreover, these unsupervised learning techniques fail to extract expertise which are not explicitly present within the documents. We cast Expertise Identification problem as an abstractive text generation problem, and use supervised learning with transformer based language models to solve this problem. We also show that existing metrics that are based on exact syntactic match between ground truth expertise and the predicted expertise are not suitable for performance evaluation of Expertise Identification techniques. Instead, we propose to use an evaluation metric based on semantic similarity. Experiments reveal that our approach based on transformers clearly outperforms the unsupervised learning techniques.

References

[1]
[n. d.]. Google Job Skills. https://www.kaggle.com/datasets/niyamatalmass/google-job-skills
[2]
[n. d.]. HuggingFace. https://huggingface.co/
[3]
Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D Manning. 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 344–354.
[4]
Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, and Martin Jaggi. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 abs/1801.04470 (2018).
[5]
Manoj Bhat, Klym Shumaiev, Kevin Koch, Uwe Hohenstein, Andreas Biesdorf, and Florian Matthes. 2018. An expert recommendation system for design decision making: Who should be involved in making a design decision?. In 2018 IEEE International Conference on Software Architecture (ICSA). IEEE, 85–8509.
[6]
Dhivya Chandrasekaran and Vijay Mago. 2021. Evolution of semantic similarity—a survey. ACM Computing Surveys (CSUR) 54, 2 (2021), 1–37.
[7]
Hung-Hsuan Chen, Alexander G Ororbia II, and C Lee Giles. 2015. Expertseer: a keyphrase based expert recommender for digital libraries. arXiv preprint arXiv:1511.02058 abs/1511.02058 (2015).
[8]
Tong Chen, Xuewei Wang, Tianwei Yue, Xiaoyu Bai, Cindy X Le, and Wenping Wang. 2023. Enhancing Abstractive Summarization with Extracted Knowledge Graphs and Multi-Source Transformers. Applied Sciences 13, 13 (2023), 7753.
[9]
Felipe Penhorate Carvalho da Fonseca and Luciano Antonio Digiampietri. 2021. Improving researcher’s area of expertise identification using TF-IDF Characters N-grams. In XVII Brazilian Symposium on Information Systems. 1–7.
[10]
Thomas H Davenport and Laurence Prusak. 1998. Working knowledge: How organizations manage what they know. Harvard Business Press.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[12]
Gabriela Ferraro and Hanna Suominen. 2020. Transformer semantic parsing. In Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association. 121–126.
[13]
Robin S Grenier and Marie-Line Germain. 2021. An Introduction to Expertise at Work: Current and Emerging Trends. Identifying and Measuring Expertise in Organizations (2021), 57–69.
[14]
Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1262–1273.
[15]
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, 2021. Knowledge graphs. ACM Computing Surveys (Csur) 54, 4 (2021), 1–37.
[16]
Luyang Huang, Lingfei Wu, and Lu Wang. 2020. Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5094–5107.
[17]
Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What does BERT learn about the structure of language?. In ACL 2019-57th Annual Meeting of the Association for Computational Linguistics.
[18]
Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Maria Lapata, and Hannaneh Hajishirzi. 2019. Text Generation from Knowledge Graphs with Graph Transformers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2284–2293.
[19]
Alisa Kongthon, Choochart Haruechaiyasak, Santipong Thaiprayoon, and Kanokorn Trakultaweekoon. 2017. Automatically Constructing Areas of Expertise Based on R&D Publication Data. In 2017 Portland International Conference on Management of Engineering and Technology (PICMET). IEEE, 1–6.
[20]
Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky. 2019. Revealing the Dark Secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 4365–4374. https://aclanthology.org/D19-1445
[21]
Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. 2021. Learning rich representation of keyphrases from text. arXiv preprint arXiv:2112.08547 (2021).
[22]
Tuhin Kundu, Jishnu Ray Chowdhury, and Cornelia Caragea. 2023. Neural Keyphrase Generation: Analysis and Evaluation. arXiv preprint arXiv:2304.13883 (2023).
[23]
Uday Kusupati and Venkata Ravi Teja Ailavarapu. 2022. Natural language to code using transformers. arXiv preprint arXiv:2202.00367 (2022).
[24]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).
[25]
Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 (2019).
[26]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
[27]
Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404–411.
[28]
Khalid Nassiri and Moulay Akhloufi. 2023. Transformer models used for text-based question answering systems. Applied Intelligence 53, 9 (2023), 10602–10635.
[29]
Hillary Ngai, Yoona Park, John Chen, and Mahboobeh Parsapoor. 2021. Transformer-based models for question answering on COVID19. arXiv preprint arXiv:2101.11432 (2021).
[30]
Madhura Pande, Aakriti Budhraja, Preksha Nema, Pratyush Kumar, and Mitesh M. Khapra. 2021. The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT. In AAAI Conference on Artificial Intelligence.
[31]
Jonathan Pilault, Raymond Li, Sandeep Subramanian, and Christopher Pal. 2020. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9308–9319.
[32]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
[33]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
[34]
Tim Schopf, Simon Klimek, and Florian Matthes. 2022. Patternrank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv preprint arXiv:2210.05245 (2022).
[35]
Kun-Woo Yang and Soon-Young Huh. 2008. Automatic expert identification using a text categorization technique in knowledge management systems. Expert Systems with Applications 34, 2 (2008), 1445–1455.
[36]
Dawit Yimam-Seid and Alfred Kobsa. 2003. Expert-finding systems for organizations: Problem and domain analysis and the DEMOIR approach. Journal of Organizational Computing and Electronic Commerce 13, 1 (2003), 1–24.
[37]
Dong Zhang, Shu Zhao, Zhen Duan, Jie Chen, Yanping Zhang, and Jie Tang. 2020. A multi-label classification method using a hierarchical and transparent representation for paper-reviewer recommendation. ACM Transactions on Information Systems (TOIS) 38, 1 (2020), 1–20.
[38]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
January 2024
627 pages
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BART
  2. Expertise Identification
  3. Semantic Similarity
  4. Transformers

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

CODS-COMAD 2024

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 30
    Total Downloads
  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media