research-article

Query-Focused Re-Ranking to Enhance the Performance of Text Entailment and Question Answering

Authors:
Vikanksh Nath

Web Intelligence and Text Mining, TCS Research, India

Web Intelligence and Text Mining, TCS Research, India

0000-0002-2436-5384
View Profile

,
Vipul Chauhan

Web Intelligence and Text Mining, TCS Research, India

Web Intelligence and Text Mining, TCS Research, India

0000-0001-9864-1575
View Profile

,
Lipika Dey

Web Intelligence and Text Mining, TCS Research, India

Web Intelligence and Text Mining, TCS Research, India

0000-0003-3831-5545
View Profile

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)January 2023Pages 47–54https://doi.org/10.1145/3570991.3571015

Published:04 January 2023Publication History

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

Pages 47–54

ABSTRACT

Transformer-based models have dramatically improved performance of various natural language processing tasks like question answering, fact verification, topic-driven summarization and natural language inferencing. However, these models can’t process input context longer than their token-length limit (TLL) at a time. Given a large document however, the required context may be spread over a larger area and also may not be restricted to contiguous sentences. Existing methods fail to handle such situations correctly. In this paper, we propose a method to handle this issue by detecting the right context from a large document before performing the actual query-context text-pair task. The proposed method fragments a long text document into sub-texts and then employs a cross-encoder model to generate a query-focused relevance score for each sub-text module. The actual downstream task is performed with the most relevant sub-text chosen as the context, rather than arbitrarily selecting the top few sentences. This extricates the model from the traditional way of iterating over TLL window size text fragments and saves computational cost. The efficacy of the approach has been established with multiple tasks. The proposed model out-performs several state of the art models for the tasks by a significant margin.

References

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268(2016).Google Scholar
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. arXiv:2004.05150 (2020).Google Scholar
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 632–642. https://doi.org/10.18653/v1/D15-1075Google ScholarCross Ref
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175(2018).Google Scholar
Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, 2020. Rethinking attention with performers. arXiv preprint arXiv:2009.14794(2020).Google Scholar
Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555(2020).Google Scholar
Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. 2021. A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers. In NAACL.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google Scholar
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. Advances in neural information processing systems 28 (2015).Google Scholar
Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2019. Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv preprint arXiv:1905.01969(2019).Google Scholar
Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451(2020).Google Scholar
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461(2019).Google Scholar
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.Google Scholar
Hanmeng Liu, Leyang Cui, Jian Liu, and Yue Zhang. 2021. Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts. Proceedings of the AAAI Conference on Artificial Intelligence 35, 15 (May 2021), 13388–13396. https://ojs.aaai.org/index.php/AAAI/article/view/17580Google ScholarCross Ref
Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345(2019).Google Scholar
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019).Google Scholar
Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh, and Mark Dredze. 2019. Phenotyping of Clinical Notes with Improved Document Classification Models Using Contextualized Neural Language Models. arXiv e-prints, Article arXiv:1910.13664 (Oct. 2019), arXiv:1910.13664 pages. arxiv:1910.13664 [cs.CL]Google Scholar
Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. 2020. Adversarial NLI: A New Benchmark for Natural Language Understanding. In ACL. 4885–4901. https://doi.org/10.18653/v1/2020.acl-main.441Google Scholar
Raghavendra Pappagari, Piotr Zelasko, Jesús Villalba, Yishay Carmiel, and Najim Dehak. 2019. Hierarchical transformers for long document classification. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 838–844.Google ScholarCross Ref
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683(2019).Google Scholar
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. arXiv e-prints arXiv:1606.05250(2016).Google Scholar
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084(2019).Google Scholar
Max Savery, Asma Ben Abacha, Soumya Gayen, and Dina Demner-Fushman. 2020. Question-driven summarization of answers to consumer health questions. Scientific Data 7, 1 (2020), 1–9.Google ScholarCross Ref
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification?. In China national conference on Chinese computational linguistics. Springer, 194–206.Google ScholarDigital Library
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355(2018).Google Scholar
George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics 16, 1 (2015), 1–28.Google Scholar
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 1112–1122. https://doi.org/10.18653/v1/N18-1101Google ScholarCross Ref
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Conference on Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning. PMLR, 11328–11339.Google Scholar
Ruixuan Zhang, Zhuoyu Wei, Yu Shi, and Yining Chen. 2020. {BERT}-{AL}: {BERT} for Arbitrarily Long Document Understanding. https://openreview.net/forum?id=SklnVAEFDBGoogle Scholar

Index Terms

Query-Focused Re-Ranking to Enhance the Performance of Text Entailment and Question Answering
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Document-Based Question Answering Improves Query-Focused Multi-document Summarization
Natural Language Processing and Chinese Computing
Abstract
Due to the lack of large scale datasets, it remains difficult to train neural Query-focused Multi-Document Summarization (QMDS) models. Several large size datasets on the Document-based Question Answering (DQA) have been released and numerous ...
Read More
Sentiment-oriented query-focused text summarization addressed with a multi-objective optimization approach
Abstract
Nowadays, the automatic text summarization is a highly relevant task in many contexts. In particular, query-focused summarization consists of generating a summary from one or multiple documents according to a query given by the user. ...
Highlights
- The sentiment-oriented query-focused text summarization problem is tackled.
- ...
Read More
A unified relevance model for opinion retrieval
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Representing the information need is the greatest challenge for opinion retrieval. Typical queries for opinion retrieval are composed of either just content words, or content words with a small number of cue "opinion" words. Both are inadequate for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)
January 2023
357 pages
ISBN:9781450397971
DOI:10.1145/3570991

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
language model
long text
natural language inference
question answering
re-ranking
transformers
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate197of680submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 100
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Query-Focused Re-Ranking to Enhance the Performance of Text Entailment and Question Answering

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Document-Based Question Answering Improves Query-Focused Multi-document Summarization

Sentiment-oriented query-focused text summarization addressed with a multi-objective optimization approach

A unified relevance model for opinion retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Query-Focused Re-Ranking to Enhance the Performance of Text Entailment and Question Answering

CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Document-Based Question Answering Improves Query-Focused Multi-document Summarization

Sentiment-oriented query-focused text summarization addressed with a multi-objective optimization approach

A unified relevance model for opinion retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media