skip to main content
10.1145/3477495.3531713acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

On Extractive Summarization for Profile-centric Neural Expert Search in Academia

Published: 07 July 2022 Publication History

Abstract

Identifying academic experts is crucial for the progress of science, enabling researchers to connect, form networks, and collaborate on the most pressing research problems. A key challenge for ranking experts in response to a query is how to infer their expertise from the publications they coauthored. Profile-centric approaches represent candidate experts by concatenating all their publications into a text-based profile. Despite offering a complete picture of each candidate's scientific output, such lengthy profiles make it inefficient to leverage state-of-the-art neural architectures for inferring expertise. To overcome this limitation, we investigate the suitability of extractive summarization as a mechanism to reduce candidate profiles for semantic encoding using Transformers. Our thorough experiments with a representative academic search test collection demonstrate the benefits of encoding summarized profiles for an improved expertise inference.

Supplementary Material

MP4 File (SIGIR2022.mp4)
Presentation video

References

[1]
Erik Albæk. 2011. The interaction between experts and journalists in news journalism. Journalism 12, 3 (2011), 335--348. https://doi.org/10.1177/ 1464884910392851 arXiv:https://doi.org/10.1177/1464884910392851
[2]
Leif Azzopardi, Krisztian Balog, and Maarten De Rijke. 2005. Language modeling approaches for enterprise tasks. NIST Special Publication 500 (2005).
[3]
Krisztian Balog. 2007. People search in the enterprise. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 916--916.
[4]
Krisztian Balog, Leif Azzopardi, and Maarten De Rijke. 2006. Formal models for expert finding in enterprise corpora. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 43-- 50.
[5]
Krisztian Balog, Yi Fang, Maarten De Rijke, Pavel Serdyukov, and Luo Si. 2012. Expertise retrieval. Foundations and Trends in Information Retrieval 6, 2--3 (2012), 127--256.
[6]
Zhijie Ban and Le Liu. 2016. CICPV: A new academic expert search model. In 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA). IEEE, 47--52.
[7]
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3615--3620. https://doi.org/10.18653/v1/D19--1371
[8]
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The LongDocument Transformer. arXiv:2004.05150 (2020).
[9]
Mark Berger, Jakub Zavrel, and Paul Groth. 2020. Effective distributed representations for academic expert search. arXiv preprint arXiv:2010.08269 (2020).
[10]
Tammy Boyce. 2006. JOURNALISM AND EXPERTISE. Journalism Studies 7, 6 (2006), 889--906. https://doi.org/10.1080/14616700600980652 arXiv:https://doi.org/10.1080/14616700600980652
[11]
Nick Craswell, Arjen P De Vries, and Ian Soboroff. 2005. Overview of the TREC 2005 Enterprise Track. In Trec, Vol. 5. 1--7.
[12]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2978--2988. https://doi.org/10.18653/v1/P19--1285
[13]
Hongbo Deng, Irwin King, and Michael R Lyu. 2008. Formal models for expert finding on dblp bibliography data. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 163--172.
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
[15]
Wafaa S. El-Kassas, Cherif R. Salama, Ahmed A. Rafea, and Hoda K. Mohamed. 2021. Automatic text summarization: A comprehensive survey. Expert Systems with Applications 165 (2021), 113679. https://doi.org/10.1016/j.eswa.2020.113679
[16]
Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22 (2004), 457--479.
[17]
Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: a survey. Artificial Intelligence Review 47, 1 (01 Jan 2017), 1--66. https://doi.org/10.1007/s10462-016--9475--9
[18]
Sujatha Das Gollapalli, Prasenjit Mitra, and C Lee Giles. 2013. Ranking experts using author-document-topic graphs. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. 87--96.
[19]
Troy Heffernan. 2021. Academic networks and career trajectory: "There's no career in academia without networks'. Higher Education Research & Development 40, 5 (2021), 981--994. https://doi.org/10.1080/07294360.2020.1799948 arXiv:https://doi.org/10.1080/07294360.2020.1799948
[20]
Yu Huang, Ziyang Liu, and Yi Chen. 2008. Query Biased Snippet Generation in XML Search. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 315--326. https://doi.org/10.1145/ 1376616.1376651
[21]
Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies 14, 4 (2021), 1--325.
[22]
Craig Macdonald and Iadh Ounis. 2008. Voting techniques for expert search. Knowledge and Information Systems 16 (09 2008), 259--280. https://doi.org/10. 1007/s10115-007-0105--3
[23]
Vítor Mangaravite, Rodrygo Santos, Isac Ribeiro, Marcos Gonçalves, and Alberto Laender. 2016. The LExR Collection for Expertise Retrieval in Academia. https: //doi.org/10.1145/2911451.2914678
[24]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2013/file/ 9aa42b31882ec039965f3c4923ce901b-Paper.pdf
[25]
N. Moratanch and S. Chitrakala. 2017. A survey on extractive text summarization. In 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP). 1--6. https://doi.org/10.1109/ICCCSP.2017.7944061
[26]
Ani Nenkova and Lucy Vanderwende. 2005. The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005 101 (2005).
[27]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999- 66. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/ Previous number = SIDL-WP-1999-0120.
[28]
Raghavendra Pappagari, Piotr Zelasko, Jesús Villalba, Yishay Carmiel, and Najim Dehak. 2019. Hierarchical Transformers for Long Document Classification. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 838--844. https://doi.org/10.1109/ASRU46091.2019.9003958
[29]
Desislava Petkova and W Bruce Croft. 2008. Hierarchical language models for expert finding in enterprise corpora. International Journal on Artificial Intelligence Tools 17, 01 (2008), 5--18.
[30]
Theresia V Rampisela and Evi Yulianti. 2020. Academic expert finding in indonesia using word embedding and document embedding: A case study of fasilkom UI. In 2020 8th International Conference on Information and Communication Technology (ICoICT). IEEE, 1--6.
[31]
Jorge V Tohalino and Diego R Amancio. 2018. Extractive multi-document summarization using multilayer networks. Physica A: Statistical Mechanics and its Applications 503 (2018), 526--539.
[32]
Andrew Turpin, Yohannes Tsegay, David Hawking, and Hugh E. Williams. 2007. Fast Generation of Result Snippets in Web Search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands) (SIGIR '07). Association for Computing Machinery, New York, NY, USA, 127--134. https://doi.org/10.1145/ 1277741.1277766
[33]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, " ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[34]
Su Yan and Xiaojun Wan. 2014. SRRank: leveraging semantic roles for extractive multi-document summarization. IEEE/ACM Transactions on audio, speech, and language processing 22, 12 (2014), 2048--2058.
[35]
Liu Yang, Mingyang Zhang, Cheng Li, Michael Bendersky, and Marc Najork. 2020. Beyond 512 Tokens: Siamese Multi-Depth Transformer-Based Hierarchical Encoder for Long-Form Document Matching. In Proceedings of the 29th ACM International Conference on Information amp; Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, New York, NY, USA, 1725--1734. https://doi.org/10.1145/3340531.3411908
[36]
Zhou Zhao, Furu Wei, Ming Zhou, and Wilfred Ng. 2015. Cold-start expert finding in community question answering via graph regularization. In International conference on database systems for advanced applications. Springer, 21--38.

Cited By

View all
  • (2024)Improving expert search effectiveness: Comparing ways to rank and present search resultsProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638296(56-65)Online publication date: 10-Mar-2024
  • (2023)A Topic-aware Summarization Framework with Different Modal Side InformationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591630(1416-1425)Online publication date: 19-Jul-2023

Index Terms

  1. On Extractive Summarization for Profile-centric Neural Expert Search in Academia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2022
    3569 pages
    ISBN:9781450387323
    DOI:10.1145/3477495
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. expert search
    3. extractive summarization
    4. information retrieval

    Qualifiers

    • Short-paper

    Conference

    SIGIR '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improving expert search effectiveness: Comparing ways to rank and present search resultsProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638296(56-65)Online publication date: 10-Mar-2024
    • (2023)A Topic-aware Summarization Framework with Different Modal Side InformationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591630(1416-1425)Online publication date: 19-Jul-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media