Abstract
Complex question answering (CQA) is widely used in real-world tasks such as search engines and intelligent customer service. With the development of large-scale knowledge bases, CQA over knowledge bases has attracted considerable attention in recent years. However, there are many types of complex questions, and few works deeply focus on the performance analysis of models for different types of questions. Another major challenge is the lack of complete supervised labels due to the expense of manual labelling, decreasing model interpretability and increasing the difficulty of model training. In this paper, we constructed a dataset, named CoSuQue, which includes multiple types of complex questions and complete supervised labels that are easily obtained. Our work provides an in-depth analysis of the model’s ability to answer different types of questions, contributing a comprehensive evaluation of the performance of CQA models. Based on the ability of the model to handle different types of questions, the model structure can be improved in a more targeted manner. The different types of complex questions and the complete supervised labels allow the inference process of the model to be investigated. Furthermore, we propose a novel training method that leverages the proposed dataset to improve the performance of the model on other publicly available datasets. Experiments on the Complex WebQuestions and WebQuestionsSP datasets demonstrate the effectiveness of our approach on the CQA task.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Code availability
Some or models, or code that support the findings of this study are available from the corresponding author upon reasonable request.
Notes
The Knowledge base can be downloaded from https://developers.google.com/freebase/.
References
Jiang Y, Bansal M (2019) Self-assembling modular networks for interpretable multi-hop reasoning. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4474–4484
Cao X, Liu Y (2021) Coarse-grained decomposition and fine-grained interaction for multi-hop question answering. J Intell Inform Syst 58:21–41
Jiang Y, Bansal M (2019) Avoiding reasoning shortcuts: adversarial evaluation, training, and model development for multi-hop QA. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2726–2736
Yang Z, Qi P, Zhang S, Bengio Y, Cohen W, Salakhutdinov R, Manning CD (2018) Hotpotqa: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2369–2380
Cao X, Liu Y, Hu B, Zhang Y (2021) Dual-channel reasoning model for complex question answering. Complexity 2021:7367181. https://doi.org/10.1155/2021/7367181
Ren H, Dai H, Dai B, Chen X, Yasunaga M, Sun H, Schuurmans D, Leskovec J, Zhou D (2021) Lego: latent execution-guided reasoning for multi-hop question answering on knowledge graphs. In: International conference on machine learning, pp 8959–8970. PMLR
Saxena A, Chakrabarti S, Talukdar P (2021) Question answering over temporal knowledge graphs. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 6663–6676
Kapanipathi P, Abdelaziz I, Ravishankar S, Roukos S, Gray A, Astudillo RF, Chang M, Cornelio C, Dana S, Fokoue-Nkoutche A et al (2021) Leveraging abstract meaning representation for knowledge base question answering. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp 3884–3894
Gu Y, Kase S, Vanni M, Sadler B, Liang P, Yan X, Su Y (2021) Beyond iid: three levels of generalization for question answering on knowledge bases. In: Proceedings of the web conference 2021, pp 3477–3488
Xu K, Lai Y, Feng Y, Wang Z (2019) Enhancing key-value memory neural networks for knowledge based question answering. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 2937–2947
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: The Semantic Web, pp 722–735. Springer
Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on world wide web, pp 697–706
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp. 1247–1250
Li X, Zang H, Yu X, Wu H, Zhang Z, Liu J, Wang M (2021) On improving knowledge graph facilitated simple question answering system. Neural Comput Appl 33(16):10587–10596
Min S, Zhong V, Zettlemoyer L, Hajishirzi H (2019) Multi-hop reading comprehension through question decomposition and rescoring. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 6097–6109
Liang C, Berant J, Le Q, Forbus K, Lao N (2017) Neural symbolic machines: learning semantic parsers on freebase with weak supervision. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 23–33
Qiu Y, Wang Y, Jin X, Zhang K (2020) Stepwise reasoning for multi-relation question answering over knowledge graph with weak supervision. In: Proceedings of the 13th international conference on web search and data mining, pp 474–482
Qiu Y, Zhang K, Wang Y, Jin X, Bai L, Guan S, Cheng X (2020) Hierarchical query graph generation for complex question answering over knowledge graph. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 1285–1294
He G, Lan Y, Jiang J, Zhao WX, Wen J-R (2021) Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 553–561
Luo K, Lin F, Luo X, Zhu K (2018) Knowledge base question answering via encoding of complex query graphs. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2185–2194
Chen Y, Li H, Hua Y, Qi G (2021) Formal query building with query structure prediction for complex question answering over knowledge base. In: Proceedings of the Twenty-Ninth international conference on international joint conferences on artificial intelligence, pp 3751–3758
Zhu S, Cheng X, Su S (2020) Knowledge-based question answering by tree-to-sequence learning. Neurocomputing 372:64–72
Han J, Cheng B, Wang X (2020) Open domain question answering based on text enhanced knowledge graph with hyperedge infusion. In: Findings of the association for computational linguistics: EMNLP 2020, pp 1475–1481
Sun H, Dhingra B, Zaheer M, Mazaitis K, Salakhutdinov R, Cohen W (2018) Open domain question answering using early fusion of knowledge bases and text. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 4231–4242
Talmor A, Berant J (2018) The web as a knowledge-base for answering complex questions. In: Proceedings of the 2018 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long Papers), pp. 641–651
Yih SW-t, Chang M-W, He X, Gao J (2015) Semantic parsing via staged query graph generation: Question answering with knowledge base. In: Proceedings of the joint conference of the 53rd annual meeting of the ACL and the 7th international joint conference on natural language processing of the AFNLP
Hao T, Li X, He Y, Wang FL, Qu Y (2022) Recent progress in leveraging deep learning methods for question answering. Neural Comput Appl 34:2765–2783. https://doi.org/10.1007/s00521-021-06748-3
Lan Y, Jiang J (2020) Query graph generation for answering multi-hop complex questions from knowledge bases. In: Association for computational linguistics
Sun Y, Zhang L, Cheng G, Qu Y (2020) Sparqa: skeleton-based semantic parsing for complex questions over knowledge bases. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp 8952–8959
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186
Das R, Zaheer M, Thai D, Godbole A, Perez E, Lee JY, Tan L, Polymenakos L, McCallum A (2021) Case-based reasoning for natural language queries over knowledge bases. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 9594–9611
Zhang Y, Dai H, Kozareva Z, Smola AJ, Song L (2018) Variational reasoning for question answering with knowledge graph. In: Thirty-Second AAAI conference on artificial intelligence
Yan Y, Li R, Wang S, Zhang H, Daoguang Z, Zhang F, Wu W, Xu W (2021) Large-scale relation learning for question answering over knowledge bases with pre-trained language models. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 3653–3660
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. Adv Neural Inform Process Syst 26
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol. 28
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence
Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), pp 687–696
Peng Y, Zhang J (2020) Lineare: simple but powerful knowledge graph embedding for link prediction. In: 2020 IEEE international conference on data mining (ICDM), pp 422–431. IEEE
Chao L, He J, Wang T, Chu W (2021) Pairre: knowledge graph embeddings via paired relation vectors. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 4360–4369
Saxena A, Tripathi A, Talukdar P (2020) Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 4498–4507
Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G (2016) Complex embeddings for simple link prediction. In: International conference on machine learning, pp 2071–2080. PMLR
Ren H, Hu W, Leskovec J (2019) Query2box: reasoning over knowledge graphs in vector space using box embeddings. In: International conference on learning representations
Liu L, Du B, Ji H, Zhai C, Tong H (2021) Neural-answering logical queries on knowledge graphs. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 1087–1097
Zhang Z, Wang J, Chen J, Ji S, Wu F (2021) Cone: cone embeddings for multi-hop reasoning over knowledge graphs. Adv Neural Inform Process Syst 34:19172–19183
Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Miller AH, Fisch A, Dodge J, Karimi A-H, Bordes A, Weston J (2016) Key-value memory networks for directly reading documents. In: EMNLP
Sun H, Bedrax-Weiss T, Cohen W (2019) Pullnet: Open domain question answering with iterative retrieval on knowledge bases and text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 2380–2390
Xiong W, Yu M, Chang S, Guo X, Wang WY (2019) Improving question answering over incomplete kbs with knowledge-aware reader. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4258–4264
Shen Y, Yang M, Li Y, Wang D, Zheng H, Chen D (2021) Knowledge-based reasoning network for relation detection. IEEE Trans Neural Net Learn Syst. https://doi.org/10.1109/TNNLS.2021.3123751
Zhang Y, Jin L, Zhang Z, Li X, Liu Q, Wang H (2022) Sf-ann: leveraging structural features with an attention neural network for candidate fact ranking. Appl Intell 52(5):5841–5856
Acknowledgements
This research was supported by the Fundamental Research Funds for the Central Universities (Grant number 2020YJS012) and National Key R &D Program of China(No.2018YFC0832300; No.2018YFC0832303).
Funding
This research was funded by the Fundamental Research Funds for the Central Universities (Grant Number 2020YJS012) and National Key R &D Program of China (No.2018YFC0832300; No.2018YFC0832303).
Author information
Authors and Affiliations
Contributions
XC, YZ and BS designed the study; XC performed the experiments, analysed the data, and wrote the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interest to declare that are relevant to the content of this article.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cao, X., Zhao, Y. & Shen, B. Improving and evaluating complex question answering over knowledge bases by constructing strongly supervised data. Neural Comput & Applic 35, 5513–5533 (2023). https://doi.org/10.1007/s00521-022-07965-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07965-0