Row-based hierarchical graph network for multi-hop question answering over textual and tabular data

Yang, Peng; Li, Wenjun; Zhao, Guangzhen; Zha, Xianyu

doi:10.1007/s11227-022-05035-9

Row-based hierarchical graph network for multi-hop question answering over textual and tabular data

Published: 20 January 2023

Volume 79, pages 9795–9818, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Peng Yang¹,
Wenjun Li¹^an1,
Guangzhen Zhao¹ &
…
Xianyu Zha¹

252 Accesses
1 Citation
Explore all metrics

Abstract

Multi-hop Question Answering over heterogeneous data is a challenging task in Natural Language Processing(NLP), which aims to find the answer among heterogeneous data sources and reasoning chains. When facing complex reasoning scenarios, most existing QA systems can only focus on some specific types of data. To solve this issue, we propose a new approach based on Row Hierarchical Graph Network(RHGN), which can accomplish multi-hop QA over both textual and tabular data. Specifically, RHGN consists of two phases: the row selection phase is designed to find the table row that most likely contains the answer, and the row reading comprehension phase that aims to locate the final answer in the answer row. In the row selection phase, we utilize a retriever to search all the supporting evidence related to the question, and a pre-training language model is employed to select the appropriate answer row. In the succeeding stage of row reading comprehension, we propose a row-based hierarchical graph network to capture the structural information, and a gated mechanism is used to perform graph reasoning. Eventually, the optimum final answer can be obtained by three interrelated sub-tasks. The experimental results demonstrate the effectiveness of RHGN and it achieves superior performance on the HybridQA dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge Graphs: Opportunities and Challenges

Article Open access 03 April 2023

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

Article 30 January 2023

Data availability

Some data, models and code generated or used during the study will be available under reasonable request from the corresponding author.

References

Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1. Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Yang Z, Qi P, Zhang S, Bengio Y, Cohen W, Salakhutdinov R, Manning CD (2018) HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, pp 2369–2380. https://doi.org/10.18653/v1/D18-1259. https://www.aclweb.org/anthology/D18-1259
Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, pp 2383–2392. https://doi.org/10.18653/v1/D16-1264. https://www.aclweb.org/anthology/D16-1264
Pasupat P, Liang P (2015) Compositional semantic parsing on semi-structured tables. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, vol 1 Long Papers. The Association for Computer Linguistics, Beijing, China, pp 1470–1480. https://doi.org/10.3115/v1/p15-1142. https://doi.org/10.3115/v1/p15-1142
Berant J, Chou A, Frostig R, Liang P (2013) Semantic parsing on Freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, Washington, USA, pp 1533–1544. https://www.aclweb.org/anthology/D13-1160
Talmor A, Berant J (2018) The web as a knowledge-base for answering complex questions. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, pp 641–651. https://doi.org/10.18653/v1/N18-1059. https://www.aclweb.org/anthology/N18-1059
Min S, Zhong V, Zettlemoyer L, Hajishirzi H (2019) Multi-hop reading comprehension through question decomposition and rescoring. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, pp 6097–6109. https://doi.org/10.18653/v1/P19-1613. https://www.aclweb.org/anthology/P19-1613
Yadav V, Bethard S, Surdeanu M (2020) Unsupervised alignment-based iterative evidence retrieval for multi-hop question answering. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 4514–4525. https://doi.org/10.18653/v1/2020.acl-main.414. https://www.aclweb.org/anthology/2020.acl-main.414
Asai A, Hashimoto K, Hajishirzi H, Socher R, Xiong C (2020) Learning to retrieve reasoning paths over wikipedia graph for question answering. In: International Conference on Learning Representations. https://openreview.net/forum?id=SJgVHkrYDH
Zhong W, Tang D, Feng Z, Duan N, Zhou M, Gong M, Shou L, Jiang D, Wang J, Yin J (2020) LogicalFactChecker: leveraging logical operations for fact checking with graph module network. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 6053–6065. https://doi.org/10.18653/v1/2020.acl-main.539. https://www.aclweb.org/anthology/2020.acl-main.539
Yu T, Wu C, Lin XV, Wang B, Tan YC, Yang X, Radev DR, Socher R, Xiong C (2020) Grappa: grammar-augmented pre-training for table semantic parsing. CoRR abs/2009.13845arXiv:2009.13845
Yin P, Neubig G, Yih W-t, Riedel S (2020) TaBERT: pretraining for joint understanding of textual and tabular data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 8413–8426. https://doi.org/10.18653/v1/2020.acl-main.745. https://www.aclweb.org/anthology/2020.acl-main.745
Sun H, Bedrax-Weiss T, Cohen WW (2019) Pullnet: open domain question answering with iterative retrieval on knowledge bases and text. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, Hong Kong, China, pp 2380–2390. https://doi.org/10.18653/v1/D19-1242. https://doi.org/10.18653/v1/D19-1242
Chen W, Zha H, Chen Z, Xiong W, Wang H, Wang WY (2020) HybridQA: a dataset of multi-hop question answering over tabular and textual data. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, pp 1026–1036. https://doi.org/10.18653/v1/2020.findings-emnlp.91. https://www.aclweb.org/anthology/2020.findings-emnlp.91
Zhu F, Lei W, Huang Y, Wang C, Zhang S, Lv J, Feng F, Chua T (2021) TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol 1. Association for Computational Linguistics, Online, pp 3277–3287. https://doi.org/10.18653/v1/2021.acl-long.254. https://doi.org/10.18653/v1/2021.acl-long.254
Chen W, Chang M, Schlinger E, Wang WY, Cohen WW (2021) Open question answering over tables and text. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria. OpenReview.net, Austria. https://openreview.net/forum?id=MmCRswl1UYl
Sun H, Cohen WW, Salakhutdinov R (2021) End-to-end multi-hop retrieval for compositional question answering over long documents
De Cao N, Aziz W, Titov I (2019) Question answering by reasoning across documents with graph convolutional networks. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1. Association for Computational Linguistics, Minneapolis, Minnesota, pp 2306–2317. https://doi.org/10.18653/v1/N19-1240. https://www.aclweb.org/anthology/N19-1240
Tu M, Wang G, Huang J, Tang Y, He X, Zhou B (2019) Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, pp 2704–2713. https://doi.org/10.18653/v1/P19-1260. https://www.aclweb.org/anthology/P19-1260
Fang Y, Sun S, Gan Z, Pillai R, Wang S, Liu J (2020) Hierarchical graph network for multi-hop question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 8823–8838. https://doi.org/10.18653/v1/2020.emnlp-main.710. https://www.aclweb.org/anthology/2020.emnlp-main.710
Tu M, Huang K, Wang G, Huang J, Zhou B (2020) Select, answer and explain: interpretable multi-hop reading comprehension over multiple documents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34. pp 9073–9080
Herzig J, Nowak PK, Müller T, Piccinno F, Eisenschlos J (2020) TaPas: weakly supervised table parsing via pre-training. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, pp 4320–4333. https://doi.org/10.18653/v1/2020.acl-main.398. https://www.aclweb.org/anthology/2020.acl-main.398
Wang Y, Bao J, Duan C, Wu Y, He X, Zhao T (2022) MuGER\(^2\): multi-granularity evidence retrieval and reasoning for hybrid question answering. arXiv. https://doi.org/10.48550/ARXIV.2210.10350. arXiv:https://arxiv.org/abs/2210.10350
Kumar V, Chemmengath S, Gupta Y, Sen J, Bharadwaj S, Chakrabarti S (2021) Multi-instance training for question answering across table and linked text
Feng Y, Han Z, Sun M, Li P (2022) Multi-hop open-domain question answering over structured and unstructured knowledge. In: Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics, Seattle, United States, pp 151–156. https://doi.org/10.18653/v1/2022.findings-naacl.12. https://aclanthology.org/2022.findings-naacl.12
Hwang W, Yim J, Park S, Seo M (2019) A comprehensive exploration on wikisql with table-aware word contextualization. arXiv preprint arXiv:1902.01069
Joshi M, Levy O, Zettlemoyer L, Weld D (2019) BERT for coreference resolution: baselines and analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 5803–5808. https://doi.org/10.18653/v1/D19-1588. https://www.aclweb.org/anthology/D19-1588
Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer
Yu T, Zhang R, Yang K, Yasunaga M, Wang D, Li Z, Ma J, Li I, Yao Q, Roman S, Zhang Z, Radev DR (2018) Spider: a large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, Association for Computational Linguistics, Belgium, pp 3911–3921. https://doi.org/10.18653/v1/d18-1425. https://doi.org/10.18653/v1/d18-1425
Parikh AP, Wang X, Gehrmann S, Faruqui M, Dhingra B, Yang D, Das D (2020) Totto: a controlled table-to-text generation dataset. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online, pp 1173–1186. https://doi.org/10.18653/v1/2020.emnlp-main.89. https://doi.org/10.18653/v1/2020.emnlp-main.89
Eisenschlos J, Gor M, Müller T, Cohen W (2021) MATE: multi-view attention for table transformer efficiency. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp 7606–7619. https://doi.org/10.18653/v1/2021.emnlp-main.600. https://aclanthology.org/2021.emnlp-main.600
Zhang H, Wang Y, Wang S, Cao X, Zhang F, Wang Z (2020) Table fact verification with structure-aware transformer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1624–1629. https://doi.org/10.18653/v1/2020.emnlp-main.126. https://aclanthology.org/2020.emnlp-main.126
Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: self-attention with linear complexity. arXiv preprint arXiv:2006.04768
Joshi M, Chen D, Liu Y, Weld D, Zettlemoyer L, Levy O (2020) Spanbert: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77
Article Google Scholar
Lyu C, Shang L, Graham Y, Foster J, Jiang X, Liu Q (2021) Improving unsupervised question answering via summarization-informed question generation. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, Association for Computational Linguistics, Online, pp 4134–4148. https://doi.org/10.18653/v1/2021.emnlp-main.340. https://doi.org/10.18653/v1/2021.emnlp-main.340
Pan L, Chen W, Xiong W, Kan M-Y, Wang WY (2021) Unsupervised multi-hop question answering by question generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, pp 5866–5880. https://doi.org/10.18653/v1/2021.naacl-main.469. https://aclanthology.org/2021.naacl-main.469
Shakeri S, Nogueira dos Santos C, Zhu H, Ng P, Nan F, Wang Z, Nallapati R, Xiang B (2020) End-to-end synthetic data generation for domain adaptation of question answering systems. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 5445–5460. https://doi.org/10.18653/v1/2020.emnlp-main.439. https://aclanthology.org/2020.emnlp-main.439
Sun N, Yang X, Liu Y (2020) Tableqa: a large-scale chinese text-to-sql dataset for table-aware SQL generation. CoRR abs/2006.06434arXiv:2006.06434

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62272100, and in part by the Fundamental Research Funds for the Central Universities and the Academy-Locality Cooperation Project of Chinese Academy of Engineering under Grant JS2021ZT05.

Author information

Authors and Affiliations

Key Laboratory of Computer Network and Information Integration (Southeast University, Ministry of Education), School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, China
Peng Yang, Wenjun Li, Guangzhen Zhao & Xianyu Zha

Author notes

Wenjun Li have contributed equally to this work.
- Wenjun Li

Authors

Peng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Guangzhen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xianyu Zha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Yang.

Ethics declarations

Competing interest

The authors declare that they have no competing interests.

Ethics approval

All authors read and approved the final version of the manuscript.

Consent to participate

All authors contributed to this work.

Consent for publication

All authors have checked the manuscript and have agreed to the submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 298 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, P., Li, W., Zhao, G. et al. Row-based hierarchical graph network for multi-hop question answering over textual and tabular data. J Supercomput 79, 9795–9818 (2023). https://doi.org/10.1007/s11227-022-05035-9

Download citation

Accepted: 29 December 2022
Published: 20 January 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11227-022-05035-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Row-based hierarchical graph network for multi-hop question answering over textual and tabular data

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

A survey on deep learning approaches for text-to-SQL

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Author notes

Wenjun Li have contributed equally to this work.

Corresponding author

Ethics declarations

Competing interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 298 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Row-based hierarchical graph network for multi-hop question answering over textual and tabular data

Abstract

Access this article

Similar content being viewed by others

Knowledge Graphs: Opportunities and Challenges

A survey on deep learning approaches for text-to-SQL

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Author notes

Wenjun Li have contributed equally to this work.

Corresponding author

Ethics declarations

Competing interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 298 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation