skip to main content
10.1145/3632410.3632454acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
short-paper

Expertise Identification Using Transformers

Published:04 January 2024Publication History

ABSTRACT

Expertise Identification involves extracting expertise/skills of a person from a set of documents related to his work. This has many important applications in large multi-disciplinary organizations such as ours. Most of the existing approaches for Expertise Identification apply unsupervised learning techniques such as those based on TF-IDF to extract keyphrases from the documents, which are then used as expertise. However, keyphrases represent the main ideas covered in a document, whereas expertise should be more domain-specific and detailed to be practically usable. Moreover, these unsupervised learning techniques fail to extract expertise which are not explicitly present within the documents. We cast Expertise Identification problem as an abstractive text generation problem, and use supervised learning with transformer based language models to solve this problem. We also show that existing metrics that are based on exact syntactic match between ground truth expertise and the predicted expertise are not suitable for performance evaluation of Expertise Identification techniques. Instead, we propose to use an evaluation metric based on semantic similarity. Experiments reveal that our approach based on transformers clearly outperforms the unsupervised learning techniques.

References

  1. [n. d.]. Google Job Skills. https://www.kaggle.com/datasets/niyamatalmass/google-job-skillsGoogle ScholarGoogle Scholar
  2. [n. d.]. HuggingFace. https://huggingface.co/Google ScholarGoogle Scholar
  3. Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D Manning. 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 344–354.Google ScholarGoogle ScholarCross RefCross Ref
  4. Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, and Martin Jaggi. 2018. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 abs/1801.04470 (2018).Google ScholarGoogle Scholar
  5. Manoj Bhat, Klym Shumaiev, Kevin Koch, Uwe Hohenstein, Andreas Biesdorf, and Florian Matthes. 2018. An expert recommendation system for design decision making: Who should be involved in making a design decision?. In 2018 IEEE International Conference on Software Architecture (ICSA). IEEE, 85–8509.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dhivya Chandrasekaran and Vijay Mago. 2021. Evolution of semantic similarity—a survey. ACM Computing Surveys (CSUR) 54, 2 (2021), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hung-Hsuan Chen, Alexander G Ororbia II, and C Lee Giles. 2015. Expertseer: a keyphrase based expert recommender for digital libraries. arXiv preprint arXiv:1511.02058 abs/1511.02058 (2015).Google ScholarGoogle Scholar
  8. Tong Chen, Xuewei Wang, Tianwei Yue, Xiaoyu Bai, Cindy X Le, and Wenping Wang. 2023. Enhancing Abstractive Summarization with Extracted Knowledge Graphs and Multi-Source Transformers. Applied Sciences 13, 13 (2023), 7753.Google ScholarGoogle ScholarCross RefCross Ref
  9. Felipe Penhorate Carvalho da Fonseca and Luciano Antonio Digiampietri. 2021. Improving researcher’s area of expertise identification using TF-IDF Characters N-grams. In XVII Brazilian Symposium on Information Systems. 1–7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thomas H Davenport and Laurence Prusak. 1998. Working knowledge: How organizations manage what they know. Harvard Business Press.Google ScholarGoogle Scholar
  11. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle ScholarCross RefCross Ref
  12. Gabriela Ferraro and Hanna Suominen. 2020. Transformer semantic parsing. In Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association. 121–126.Google ScholarGoogle Scholar
  13. Robin S Grenier and Marie-Line Germain. 2021. An Introduction to Expertise at Work: Current and Emerging Trends. Identifying and Measuring Expertise in Organizations (2021), 57–69.Google ScholarGoogle Scholar
  14. Kazi Saidul Hasan and Vincent Ng. 2014. Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1262–1273.Google ScholarGoogle ScholarCross RefCross Ref
  15. Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, 2021. Knowledge graphs. ACM Computing Surveys (Csur) 54, 4 (2021), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Luyang Huang, Lingfei Wu, and Lu Wang. 2020. Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5094–5107.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ganesh Jawahar, Benoît Sagot, and Djamé Seddah. 2019. What does BERT learn about the structure of language?. In ACL 2019-57th Annual Meeting of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  18. Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan, Maria Lapata, and Hannaneh Hajishirzi. 2019. Text Generation from Knowledge Graphs with Graph Transformers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2284–2293.Google ScholarGoogle Scholar
  19. Alisa Kongthon, Choochart Haruechaiyasak, Santipong Thaiprayoon, and Kanokorn Trakultaweekoon. 2017. Automatically Constructing Areas of Expertise Based on R&D Publication Data. In 2017 Portland International Conference on Management of Engineering and Technology (PICMET). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  20. Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky. 2019. Revealing the Dark Secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 4365–4374. https://aclanthology.org/D19-1445Google ScholarGoogle Scholar
  21. Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. 2021. Learning rich representation of keyphrases from text. arXiv preprint arXiv:2112.08547 (2021).Google ScholarGoogle Scholar
  22. Tuhin Kundu, Jishnu Ray Chowdhury, and Cornelia Caragea. 2023. Neural Keyphrase Generation: Analysis and Evaluation. arXiv preprint arXiv:2304.13883 (2023).Google ScholarGoogle Scholar
  23. Uday Kusupati and Venkata Ravi Teja Ailavarapu. 2022. Natural language to code using transformers. arXiv preprint arXiv:2202.00367 (2022).Google ScholarGoogle Scholar
  24. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).Google ScholarGoogle Scholar
  25. Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 (2019).Google ScholarGoogle Scholar
  26. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  27. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. 404–411.Google ScholarGoogle Scholar
  28. Khalid Nassiri and Moulay Akhloufi. 2023. Transformer models used for text-based question answering systems. Applied Intelligence 53, 9 (2023), 10602–10635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hillary Ngai, Yoona Park, John Chen, and Mahboobeh Parsapoor. 2021. Transformer-based models for question answering on COVID19. arXiv preprint arXiv:2101.11432 (2021).Google ScholarGoogle Scholar
  30. Madhura Pande, Aakriti Budhraja, Preksha Nema, Pratyush Kumar, and Mitesh M. Khapra. 2021. The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT. In AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jonathan Pilault, Raymond Li, Sandeep Subramanian, and Christopher Pal. 2020. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9308–9319.Google ScholarGoogle ScholarCross RefCross Ref
  32. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tim Schopf, Simon Klimek, and Florian Matthes. 2022. Patternrank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv preprint arXiv:2210.05245 (2022).Google ScholarGoogle Scholar
  35. Kun-Woo Yang and Soon-Young Huh. 2008. Automatic expert identification using a text categorization technique in knowledge management systems. Expert Systems with Applications 34, 2 (2008), 1445–1455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dawit Yimam-Seid and Alfred Kobsa. 2003. Expert-finding systems for organizations: Problem and domain analysis and the DEMOIR approach. Journal of Organizational Computing and Electronic Commerce 13, 1 (2003), 1–24.Google ScholarGoogle ScholarCross RefCross Ref
  37. Dong Zhang, Shu Zhao, Zhen Duan, Jie Chen, Yanping Zhang, and Jie Tang. 2020. A multi-label classification method using a hierarchical and transparent representation for paper-reviewer recommendation. ACM Transactions on Information Systems (TOIS) 38, 1 (2020), 1–20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019).Google ScholarGoogle Scholar

Index Terms

  1. Expertise Identification Using Transformers

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
          January 2024
          627 pages

          Copyright © 2024 ACM

          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 January 2024

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)17
          • Downloads (Last 6 weeks)5

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format