research-article

LiteratureQA: A Qestion Answering Corpus with Graph Knowledge on Academic Literature

Authors:

Xinbing WangAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 4623 - 4632

https://doi.org/10.1145/3459637.3482007

Published: 30 October 2021 Publication History

Abstract

In this paper, we introduce LiteratureQA, a large question answering (QA) corpus consisting of publicly available academic papers. Different from other QA corpus, LiteratureQA has its unique challenges such as how to leverage the structured knowledge of citation networks. We further examine some popular QA method and present a benchmark approach of answering academic questions by combining both semantic text and graph knowledge to improve the prevalent pre-training model. We hope this resource could help research and development of tasks for machine reading over academic text.

Supplementary Material

MP4 File (Video 2245.mp4)

In this paper, we introduce LiteratureQA, a large question answering (QA) corpus consisting of publicly available academic papers. Different from other QA corpus, LiteratureQA has its unique challenges such as how to leverage the structured knowledge of citation networks. We further examine some popular QA method and present a benchmark approach of answering academic questions by combining both semantic text and graph knowledge to improve the prevalent pre-training model. We hope this resource could help research and development of tasks for machine reading over academic text.

Download
240.95 MB

References

[1]

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 (2018).

[2]

Franck Dernoncourt and Ji Young Lee. 2017. 200k rct: a dataset for sequential sentence classication in medical abstracts. arXiv preprint arXiv:1710.06071 (2017).

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[4]

Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov. 2016. Gated-attention readers for text comprehension. arXiv preprint arXiv:1606.01549 (2016).

[5]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 855--864.

Digital Library

[6]

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Advances in neural information processing systems. 1693--1701.

Digital Library

[7]

Yining Hong, Jialu Wang, Yuting Jia, Weinan Zhang, and Xinbing Wang. 2019. Academic Reader: An Interactive Question Answering System on Academic Literatures. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9855--9856.

[8]

Tushar Khot, Ashish Sabharwal, and Peter Clark. 2018. Scitail: A textual entailment dataset from science question answering. In Thirty-Second AAAI Conference on Artificial Intelligence.

[9]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence.

Digital Library

[10]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset. (2016).

[11]

Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018).

[12]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018).

[13]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016).

[14]

Matthew Richardson, Christopher JC Burges, and Erin Renshaw. 2013. Mctest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 193--203.

[15]

Akshay Sethi, Anush Sankaran, Naveen Panwar, Shreya Khare, and Senthil Mani. 2018. DLPaper2Code: Auto-generation of code from deep learning research papers. In Thirty-Second AAAI Conference on Artificial Intelligence.

[16]

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. Disan: Directional self-attention network for rnn/cnn-free language understanding. In Thirty-Second AAAI Conference on Artificial Intelligence.

[17]

Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, Weifeng Lv, and Ming Zhou. 2018. S-net: From answer extraction to answer synthesis for machine reading comprehension. In Thirty-Second AAAI Conference on Artificial Intelligence.

[18]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

Digital Library

[19]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research 11, Dec (2010), 3371--3408.

Digital Library

[20]

Di Wang and Eric Nyberg. 2015. A long short-term memory model for answer sentence selection in question answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 707--712.

[21]

Haiwen Wang, Ruijie Wang, Chuan Wen, Shuhao Li, Yuting Jia, Weinan Zhang, and Xinbing Wang. 2020. Author name disambiguation on heterogeneous information network with adversarial representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 238--245.

[22]

Ruijie Wang, Yuchen Yan, Jialu Wang, Yuting Jia, Ye Zhang, Weinan Zhang, and Xinbing Wang. 2018. AceKG: A Large-scale Knowledge Graph for Academic Data Mining. arXiv e-prints (2018).

Digital Library

[23]

Shuohang Wang and Jing Jiang. [n.d.]. Machine comprehension using match-LSTM and answer pointer.(2017). In ICLR 2017: International Conference on Learning Representations, Toulon, France, April 24--26: Proceedings. 1--15.

[24]

Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 189--198.

[25]

Caiming Xiong, Victor Zhong, and Richard Socher. 2016. Dynamic coattention networks for question answering. arXiv preprint arXiv:1611.01604 (2016).

[26]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv preprint arXiv:1906.08237 (2019).

[27]

Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018. Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541 (2018).

[28]

Yang Yu, Wei Zhang, Kazi Hasan, Mo Yu, Bing Xiang, and Bowen Zhou. 2016. End-to-end answer chunk extraction and ranking for reading comprehension. arXiv preprint arXiv:1610.09996 (2016).

[29]

Junbei Zhang, Xiaodan Zhu, Qian Chen, Lirong Dai, Si Wei, and Hui Jiang. 2017. Exploring question understanding and adaptation in neural-network-based question answering. arXiv preprint arXiv:1703.04617 (2017).

[30]

Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced Language Representation with Informative Entities. arXiv preprint arXiv:1905.07129 (2019).

Cited By

Chen XWang TGuo TGuo KZhou JLi HSong ZGao XZhang X(2025)Unveiling the power of language models in chemical research question answeringCommunications Chemistry10.1038/s42004-024-01394-x8:1Online publication date: 5-Jan-2025
https://doi.org/10.1038/s42004-024-01394-x
Wu YZhou J(2022)EG-KGR: A Knowledge Graph Reasoning Model Based on Enhanced Graph Sample and Aggregate Inductive Learning Algorithm2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI56018.2022.00057(347-354)Online publication date: Oct-2022
https://doi.org/10.1109/ICTAI56018.2022.00057

Index Terms

LiteratureQA: A Qestion Answering Corpus with Graph Knowledge on Academic Literature
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Data cleaning

Recommendations

Personalized literature recommendation based on heterogeneous entity academic network
Abstract
Although researchers have benefited from big scholarly data, it is still very difficult for them to quickly and accurately find the suitable literature in the massive literature. In recent years, the research on personalized literature ...
Knowledge Graph Embedding Based Question Answering
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Question answering over knowledge graph (QA-KG) aims to use facts in the knowledge graph (KG) to answer natural language questions. It helps end users more efficiently and more easily access the substantial and valuable knowledge in the KG, without ...
The influence of academic advisors on academic network of Physics doctoral students: empirical evidence based on scientometrics analysis
Abstract
Scholarly socialization is a crucial and fundamental component of doctoral training in preparing future scholars. One of the important goals of doctoral student socialization is to build up the academic network, that is, to be loosely defined, to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shanghai Academic/Technology Research Leader Program
2021 Tencent AI Lab Rhino-Bird Focused Research Program
National Key R&D Program of China
NSF China

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
204
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen XWang TGuo TGuo KZhou JLi HSong ZGao XZhang X(2025)Unveiling the power of language models in chemical research question answeringCommunications Chemistry10.1038/s42004-024-01394-x8:1Online publication date: 5-Jan-2025
https://doi.org/10.1038/s42004-024-01394-x
Wu YZhou J(2022)EG-KGR: A Knowledge Graph Reasoning Model Based on Enhanced Graph Sample and Aggregate Inductive Learning Algorithm2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI56018.2022.00057(347-354)Online publication date: Oct-2022
https://doi.org/10.1109/ICTAI56018.2022.00057

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten