short-paper

On Extractive Summarization for Profile-centric Neural Expert Search in Academia

Authors:

Rennan C. Lima,

Rodrygo L. T. SantosAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2331 - 2335

https://doi.org/10.1145/3477495.3531713

Published: 07 July 2022 Publication History

Abstract

Identifying academic experts is crucial for the progress of science, enabling researchers to connect, form networks, and collaborate on the most pressing research problems. A key challenge for ranking experts in response to a query is how to infer their expertise from the publications they coauthored. Profile-centric approaches represent candidate experts by concatenating all their publications into a text-based profile. Despite offering a complete picture of each candidate's scientific output, such lengthy profiles make it inefficient to leverage state-of-the-art neural architectures for inferring expertise. To overcome this limitation, we investigate the suitability of extractive summarization as a mechanism to reduce candidate profiles for semantic encoding using Transformers. Our thorough experiments with a representative academic search test collection demonstrate the benefits of encoding summarized profiles for an improved expertise inference.

Supplementary Material

MP4 File (SIGIR2022.mp4)

Presentation video

Download
14.38 MB

References

[1]

Erik Albæk. 2011. The interaction between experts and journalists in news journalism. Journalism 12, 3 (2011), 335--348. https://doi.org/10.1177/ 1464884910392851 arXiv:https://doi.org/10.1177/1464884910392851

[2]

Leif Azzopardi, Krisztian Balog, and Maarten De Rijke. 2005. Language modeling approaches for enterprise tasks. NIST Special Publication 500 (2005).

[3]

Krisztian Balog. 2007. People search in the enterprise. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 916--916.

Digital Library

[4]

Krisztian Balog, Leif Azzopardi, and Maarten De Rijke. 2006. Formal models for expert finding in enterprise corpora. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 43-- 50.

Digital Library

[5]

Krisztian Balog, Yi Fang, Maarten De Rijke, Pavel Serdyukov, and Luo Si. 2012. Expertise retrieval. Foundations and Trends in Information Retrieval 6, 2--3 (2012), 127--256.

Digital Library

[6]

Zhijie Ban and Le Liu. 2016. CICPV: A new academic expert search model. In 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA). IEEE, 47--52.

[7]

Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3615--3620. https://doi.org/10.18653/v1/D19--1371

[8]

Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The LongDocument Transformer. arXiv:2004.05150 (2020).

[9]

Mark Berger, Jakub Zavrel, and Paul Groth. 2020. Effective distributed representations for academic expert search. arXiv preprint arXiv:2010.08269 (2020).

[10]

Tammy Boyce. 2006. JOURNALISM AND EXPERTISE. Journalism Studies 7, 6 (2006), 889--906. https://doi.org/10.1080/14616700600980652 arXiv:https://doi.org/10.1080/14616700600980652

[11]

Nick Craswell, Arjen P De Vries, and Ian Soboroff. 2005. Overview of the TREC 2005 Enterprise Track. In Trec, Vol. 5. 1--7.

[12]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2978--2988. https://doi.org/10.18653/v1/P19--1285

[13]

Hongbo Deng, Irwin King, and Michael R Lyu. 2008. Formal models for expert finding on dblp bibliography data. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 163--172.

Digital Library

[14]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805

[15]

Wafaa S. El-Kassas, Cherif R. Salama, Ahmed A. Rafea, and Hoda K. Mohamed. 2021. Automatic text summarization: A comprehensive survey. Expert Systems with Applications 165 (2021), 113679. https://doi.org/10.1016/j.eswa.2020.113679

[16]

Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 22 (2004), 457--479.

[17]

Mahak Gambhir and Vishal Gupta. 2017. Recent automatic text summarization techniques: a survey. Artificial Intelligence Review 47, 1 (01 Jan 2017), 1--66. https://doi.org/10.1007/s10462-016--9475--9

[18]

Sujatha Das Gollapalli, Prasenjit Mitra, and C Lee Giles. 2013. Ranking experts using author-document-topic graphs. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. 87--96.

Digital Library

[19]

Troy Heffernan. 2021. Academic networks and career trajectory: "There's no career in academia without networks'. Higher Education Research & Development 40, 5 (2021), 981--994. https://doi.org/10.1080/07294360.2020.1799948 arXiv:https://doi.org/10.1080/07294360.2020.1799948

[20]

Yu Huang, Ziyang Liu, and Yi Chen. 2008. Query Biased Snippet Generation in XML Search. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD '08). Association for Computing Machinery, New York, NY, USA, 315--326. https://doi.org/10.1145/ 1376616.1376651

Digital Library

[21]

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies 14, 4 (2021), 1--325.

[22]

Craig Macdonald and Iadh Ounis. 2008. Voting techniques for expert search. Knowledge and Information Systems 16 (09 2008), 259--280. https://doi.org/10. 1007/s10115-007-0105--3

[23]

Vítor Mangaravite, Rodrygo Santos, Isac Ribeiro, Marcos Gonçalves, and Alberto Laender. 2016. The LExR Collection for Expertise Retrieval in Academia. https: //doi.org/10.1145/2911451.2914678

Digital Library

[24]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2013/file/ 9aa42b31882ec039965f3c4923ce901b-Paper.pdf

[25]

N. Moratanch and S. Chitrakala. 2017. A survey on extractive text summarization. In 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP). 1--6. https://doi.org/10.1109/ICCCSP.2017.7944061

[26]

Ani Nenkova and Lucy Vanderwende. 2005. The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005 101 (2005).

[27]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999- 66. Stanford InfoLab. http://ilpubs.stanford.edu:8090/422/ Previous number = SIDL-WP-1999-0120.

[28]

Raghavendra Pappagari, Piotr Zelasko, Jesús Villalba, Yishay Carmiel, and Najim Dehak. 2019. Hierarchical Transformers for Long Document Classification. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 838--844. https://doi.org/10.1109/ASRU46091.2019.9003958

[29]

Desislava Petkova and W Bruce Croft. 2008. Hierarchical language models for expert finding in enterprise corpora. International Journal on Artificial Intelligence Tools 17, 01 (2008), 5--18.

[30]

Theresia V Rampisela and Evi Yulianti. 2020. Academic expert finding in indonesia using word embedding and document embedding: A case study of fasilkom UI. In 2020 8th International Conference on Information and Communication Technology (ICoICT). IEEE, 1--6.

[31]

Jorge V Tohalino and Diego R Amancio. 2018. Extractive multi-document summarization using multilayer networks. Physica A: Statistical Mechanics and its Applications 503 (2018), 526--539.

[32]

Andrew Turpin, Yohannes Tsegay, David Hawking, and Hugh E. Williams. 2007. Fast Generation of Result Snippets in Web Search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands) (SIGIR '07). Association for Computing Machinery, New York, NY, USA, 127--134. https://doi.org/10.1145/ 1277741.1277766

[33]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, " ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Digital Library

[34]

Su Yan and Xiaojun Wan. 2014. SRRank: leveraging semantic roles for extractive multi-document summarization. IEEE/ACM Transactions on audio, speech, and language processing 22, 12 (2014), 2048--2058.

Digital Library

[35]

Liu Yang, Mingyang Zhang, Cheng Li, Michael Bendersky, and Marc Najork. 2020. Beyond 512 Tokens: Siamese Multi-Depth Transformer-Based Hierarchical Encoder for Long-Form Document Matching. In Proceedings of the 29th ACM International Conference on Information amp; Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, New York, NY, USA, 1725--1734. https://doi.org/10.1145/3340531.3411908

Digital Library

[36]

Zhou Zhao, Furu Wei, Ming Zhou, and Wilfred Ng. 2015. Cold-start expert finding in community question answering via graph regularization. In International conference on database systems for advanced applications. Springer, 21--38.

Cited By

Schoegje THardman LDe Vries APieters T(2024)Improving expert search effectiveness: Comparing ways to rank and present search resultsProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638296(56-65)Online publication date: 10-Mar-2024
https://dl.acm.org/doi/10.1145/3627508.3638296
Chen XLi MGao SCheng XYang QZhang QGao XZhang XChen HDuh WHuang HKato MMothe JPoblete B(2023)A Topic-aware Summarization Framework with Different Modal Side InformationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591630(1416-1425)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591630

Index Terms

On Extractive Summarization for Profile-centric Neural Expert Search in Academia
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Language models

Recommendations

Extractive spoken document summarization for information retrieval

The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In this ...
Sentence Relations for Extractive Summarization with Deep Neural Networks

Sentence regression is a type of extractive summarization that achieves state-of-the-art performance and is commonly used in practical systems. The most challenging task within the sentence regression framework is to identify discriminative features to ...
Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
155
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Schoegje THardman LDe Vries APieters T(2024)Improving expert search effectiveness: Comparing ways to rank and present search resultsProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638296(56-65)Online publication date: 10-Mar-2024
https://dl.acm.org/doi/10.1145/3627508.3638296
Chen XLi MGao SCheng XYang QZhang QGao XZhang XChen HDuh WHuang HKato MMothe JPoblete B(2023)A Topic-aware Summarization Framework with Different Modal Side InformationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591630(1416-1425)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591630

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten