ABSTRACT
Expertise Identification involves extracting expertise/skills of a person from a set of documents related to his work. This has many important applications in large multi-disciplinary organizations such as ours. Most of the existing approaches for Expertise Identification apply unsupervised learning techniques such as those based on TF-IDF to extract keyphrases from the documents, which are then used as expertise. However, keyphrases represent the main ideas covered in a document, whereas expertise should be more domain-specific and detailed to be practically usable. Moreover, these unsupervised learning techniques fail to extract expertise which are not explicitly present within the documents. We cast Expertise Identification problem as an abstractive text generation problem, and use supervised learning with transformer based language models to solve this problem. We also show that existing metrics that are based on exact syntactic match between ground truth expertise and the predicted expertise are not suitable for performance evaluation of Expertise Identification techniques. Instead, we propose to use an evaluation metric based on semantic similarity. Experiments reveal that our approach based on transformers clearly outperforms the unsupervised learning techniques.
- [n. d.]. Google Job Skills. https://www.kaggle.com/datasets/niyamatalmass/google-job-skillsGoogle Scholar
- [n. d.]. HuggingFace. https://huggingface.co/Google Scholar
- Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D Manning. 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 344–354.Google ScholarCross Ref
- Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, and Martin Jaggi. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 abs/1801.04470 (2018).Google Scholar
- Manoj Bhat, Klym Shumaiev, Kevin Koch, Uwe Hohenstein, Andreas Biesdorf, and Florian Matthes. 2018. An expert recommendation system for design decision making: Who should be involved in making a design decision?. In 2018 IEEE International Conference on Software Architecture (ICSA). IEEE, 85–8509.Google ScholarCross Ref
- Dhivya Chandrasekaran and Vijay Mago. 2021. Evolution of semantic similarity—a survey. ACM Computing Surveys (CSUR) 54, 2 (2021), 1–37.Google ScholarDigital Library
- Hung-Hsuan Chen, Alexander G Ororbia II, and C Lee Giles. 2015. Expertseer: a keyphrase based expert recommender for digital libraries. arXiv preprint arXiv:1511.02058 abs/1511.02058 (2015).Google Scholar
- Tong Chen, Xuewei Wang, Tianwei Yue, Xiaoyu Bai, Cindy X Le, and Wenping Wang. 2023. Enhancing Abstractive Summarization with Extracted Knowledge Graphs and Multi-Source Transformers. Applied Sciences 13, 13 (2023), 7753.Google ScholarCross Ref
- Felipe Penhorate Carvalho da Fonseca and Luciano Antonio Digiampietri. 2021. Improving researcher’s area of expertise identification using TF-IDF Characters N-grams. In XVII Brazilian Symposium on Information Systems. 1–7.Google ScholarDigital Library
- Thomas H Davenport and Laurence Prusak. 1998. Working knowledge: How organizations manage what they know. Harvard Business Press.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
- Gabriela Ferraro and Hanna Suominen. 2020. Transformer semantic parsing. In Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association. 121–126.Google Scholar
- Robin S Grenier and Marie-Line Germain. 2021. An Introduction to Expertise at Work: Current and Emerging Trends. Identifying and Measuring Expertise in Organizations (2021), 57–69.Google Scholar
- Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1262–1273.Google ScholarCross Ref
- Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, 2021. Knowledge graphs. ACM Computing Surveys (Csur) 54, 4 (2021), 1–37.Google ScholarDigital Library
- Luyang Huang, Lingfei Wu, and Lu Wang. 2020. Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5094–5107.Google ScholarCross Ref
- Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What does BERT learn about the structure of language?. In ACL 2019-57th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Maria Lapata, and Hannaneh Hajishirzi. 2019. Text Generation from Knowledge Graphs with Graph Transformers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2284–2293.Google Scholar
- Alisa Kongthon, Choochart Haruechaiyasak, Santipong Thaiprayoon, and Kanokorn Trakultaweekoon. 2017. Automatically Constructing Areas of Expertise Based on R&D Publication Data. In 2017 Portland International Conference on Management of Engineering and Technology (PICMET). IEEE, 1–6.Google ScholarCross Ref
- Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky. 2019. Revealing the Dark Secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 4365–4374. https://aclanthology.org/D19-1445Google Scholar
- Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. 2021. Learning rich representation of keyphrases from text. arXiv preprint arXiv:2112.08547 (2021).Google Scholar
- Tuhin Kundu, Jishnu Ray Chowdhury, and Cornelia Caragea. 2023. Neural Keyphrase Generation: Analysis and Evaluation. arXiv preprint arXiv:2304.13883 (2023).Google Scholar
- Uday Kusupati and Venkata Ravi Teja Ailavarapu. 2022. Natural language to code using transformers. arXiv preprint arXiv:2202.00367 (2022).Google Scholar
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google Scholar
- Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 (2019).Google Scholar
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).Google Scholar
- Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404–411.Google Scholar
- Khalid Nassiri and Moulay Akhloufi. 2023. Transformer models used for text-based question answering systems. Applied Intelligence 53, 9 (2023), 10602–10635.Google ScholarDigital Library
- Hillary Ngai, Yoona Park, John Chen, and Mahboobeh Parsapoor. 2021. Transformer-based models for question answering on COVID19. arXiv preprint arXiv:2101.11432 (2021).Google Scholar
- Madhura Pande, Aakriti Budhraja, Preksha Nema, Pratyush Kumar, and Mitesh M. Khapra. 2021. The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT. In AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Jonathan Pilault, Raymond Li, Sandeep Subramanian, and Christopher Pal. 2020. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9308–9319.Google ScholarCross Ref
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarDigital Library
- Tim Schopf, Simon Klimek, and Florian Matthes. 2022. Patternrank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv preprint arXiv:2210.05245 (2022).Google Scholar
- Kun-Woo Yang and Soon-Young Huh. 2008. Automatic expert identification using a text categorization technique in knowledge management systems. Expert Systems with Applications 34, 2 (2008), 1445–1455.Google ScholarDigital Library
- Dawit Yimam-Seid and Alfred Kobsa. 2003. Expert-finding systems for organizations: Problem and domain analysis and the DEMOIR approach. Journal of Organizational Computing and Electronic Commerce 13, 1 (2003), 1–24.Google ScholarCross Ref
- Dong Zhang, Shu Zhao, Zhen Duan, Jie Chen, Yanping Zhang, and Jie Tang. 2020. A multi-label classification method using a hierarchical and transparent representation for paper-reviewer recommendation. ACM Transactions on Information Systems (TOIS) 38, 1 (2020), 1–20.Google ScholarDigital Library
- Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).Google Scholar
Index Terms
- Expertise Identification Using Transformers
Recommendations
Expertise transfer and complex problems
Special issue: 1969-1999, the 30th anniversaryAcquiring knowledge from a human expert is a major problem when building a knowledge-based system. Aquinas, an expanded version of the Expertise Transfer System (ETS), is a knowledge-acquisition workbench that combines ideas from psychology and ...
Expertise identification and visualization from CVS
MSR '08: Proceedings of the 2008 international working conference on Mining software repositoriesAs software evolves over time, the identification of expertise becomes an important problem. Component ownership and team awareness of such ownership are signals of solid project. Ownership and ownership awareness are also issues in open-source software ...
Comments