ABSTRACT
A challenging task in the natural language question answering (Q/A for short) over RDF knowledge graph is how to bridge the gap between unstructured natural language questions (NLQ) and graph-structured RDF data (GOne of the effective tools is the "template", which is often used in many existing RDF Q/A systems. However, few of them study how to generate templates automatically. To the best of our knowledge, we are the first to propose a join approach for template generation. Given a workload D of SPARQL queries and a set N of natural language questions, the goal is to find some pairs q, n, for q∈ D ∧ n ∈, N, where SPARQL query q is the best match for natural language question n. These pairs provide promising hints for automatic template generation. Due to the ambiguity of the natural languages, we model the problem above as an uncertain graph join task. We propose several structural and probability pruning techniques to speed up joining. Extensive experiments over real RDF Q/A benchmark datasets confirm both the effectiveness and efficiency of our approach.
- L. Brun, B. Gaüzère, and S. Fourey. Relationships between graph edit distance and maximal common structural subgraph. In SSPR/SPR, pages 42--50, 2012.Google Scholar
- J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: implementing the semantic web recommendations. In WWW, pages 74--83, 2004. Google ScholarDigital Library
- N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDB J., 16(4), 2007. Google ScholarDigital Library
- M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In COLING, pages 277--285, 2010. Google ScholarDigital Library
- R. Durrett. Probability: Theory and Examples. Cambridge University Press, 2010. Google ScholarDigital Library
- J. Eisner. Learning non-isomorphic tree mappings for machine translation. In ACL, pages 205--208, 2003. Google ScholarDigital Library
- A. Fader, L. S. Zettlemoyer, and O. Etzioni. Paraphrase-driven learning for open question answering. In ACL, pages 1608--1618, 2013.Google Scholar
- X. Gao, B. Xiao, D. Tao, and X. Li. A survey of graph edit distance. Pattern Anal. Appl., 13(1):113--129, 2010. Google ScholarDigital Library
- M. Heilman and N. A. Smith. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In NAACL, pages 1011--1019, 2010. Google ScholarDigital Library
- H.W.Kuhn. The hungarian method for the assignment problem. In Naval Research Logistics, pages 83--97, 1955.Google ScholarCross Ref
- B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query efficiency in probabilistic xml models. In SIGMOD, pages 701--714, 2008. Google ScholarDigital Library
- B. Kimelfeld and P. Senellart. Probabilistic xml: Models and complexity. In Advances in Probabilistic Databases for Uncertain Information Management, pages 39--66. 2013.Google ScholarCross Ref
- D. Klein and C. D. Manning. Accurate unlexicalized parsing. In ACL, pages 423--430, 2003. Google ScholarDigital Library
- G. Kollios, M. Potamias, and E. Terzi. Clustering large probabilistic graphs. IEEE Trans. Knowl. Data Eng., 25(2):325--336, 2013. Google ScholarDigital Library
- T. Neumann and G. Weikum. RDF-3X: a risc-style engine for RDF. PVLDB, 1(1):647--659, 2008. Google ScholarDigital Library
- D. Rao, P. McNamee, and M. Dredze. Entity linking: Finding extracted entities in a knowledge base. In A book chapter in Multi-source, Multi-lingual Information Extraction and Summarization, 2011.Google Scholar
- K. Riesen, S. Fankhauser, and H. Bunke. Speeding up graph edit distance computation with a bipartite heuristic. In MLG, 2007.Google Scholar
- D. Suciu and N. N. Dalvi. Foundations of probabilistic answers to queries. In SIGMOD Conference, 2005. Google ScholarDigital Library
- W. Tunstall-Pedoe. True knowledge: Open-domain question answering using structured knowledge and inference. AI Magazine, 31(3):80--92, 2010.Google ScholarCross Ref
- C. Unger, L. Bühmann, J. Lehmann, A.-C. N. Ngomo, D. Gerber, and P. Cimiano. Template-based question answering over rdf data. In WWW, pages 639--648, 2012. Google ScholarDigital Library
- G. Wang, B. Wang, X. Yang, and G. Yu. Efficiently indexing large sparse graphs for similarity search. TKDE, 24(3):440--451, 2012. Google ScholarDigital Library
- X. Wang, X. Ding, A. K. H. Tung, S. Ying, and H. Jin. An efficient graph indexing method. In ICDE, pages 210--221, 2012. Google ScholarDigital Library
- M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, and G. Weikum. Natural language questions for the web of data. In EMNLP-CoNLL, pages 379--390, 2012. Google ScholarDigital Library
- M. Yahya, K. Berberich, S. Elbassuoni, and G. Weikum. Robust question answering over the web of linked data. In CIKM, pages 1107--1116, 2013. Google ScholarDigital Library
- X. Yao and B. V. Durme. Information extraction over structured data: Question answering with freebase. In ACL, pages 956--966, 2014.Google ScholarCross Ref
- X. Yao, B. V. Durme, C. Callison-Burch, and P. Clark. Answer extraction as sequence tagging with tree edit distance. In NAACL, pages 858--867, 2013.Google Scholar
- Y. Yuan, G. Wang, L. Chen, and H. Wang. Efficient subgraph similarity search on large probabilistic graph databases. PVLDB, 5(9):800--811, 2012. Google ScholarDigital Library
- Y. Yuan, G. Wang, H. Wang, and L. Chen. Efficient subgraph search over large uncertain graphs. PVLDB, 4(11):876--886, 2011.Google ScholarDigital Library
- Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1):25--36, 2009. Google ScholarDigital Library
- X. Zhao, C. Xiao, X. Lin, Q. Liu, and W. Zhang. A partition-based approach to structure similarity search. PVLDB, 7(3):169--180, 2013. Google ScholarDigital Library
- X. Zhao, C. Xiao, X. Lin, and W. Wang. Efficient graph similarity joins with edit distance constraints. In ICDE, pages 834--845, 2012. Google ScholarDigital Library
- Z. Zheng, F. Li, M. Huang, and X. Zhu. Learning to link entities with knowledge base. In NAACL, pages 483--491, 2010. Google ScholarDigital Library
- L. Zou, R. Huang, H. Wang, J. X. Yu, W. He, and D. Zhao. Natural language question answering over RDF: a graph data driven approach. In SIGMOD Conference, pages 313--324, 2014. Google ScholarDigital Library
- L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gstore: Answering sparql queries via subgraph matching. PVLDB, 4(8), 2011. Google ScholarDigital Library
- Z. Zou, H. Gao, and J. Li. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD, pages 633--642, 2010. Google ScholarDigital Library
Index Terms
- How to Build Templates for RDF Question/Answering: An Uncertain Graph Similarity Join Approach
Recommendations
Template-based question answering over RDF data
WWW '12: Proceedings of the 21st international conference on World Wide WebAs an increasing amount of RDF data is published as Linked Data, intuitive ways of accessing this data become more and more important. Question answering approaches have been proposed as a good compromise between intuitiveness and expressivity. Most ...
Natural language question answering over RDF: a graph data driven approach
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataRDF question/answering (Q/A) allows users to ask questions in natural languages over a knowledge base represented by RDF. To answer a national language question, the existing work takes a two-stage approach: question understanding and query evaluation. ...
Improving the Precision of RDF Question/Answering Systems: A Why Not Approach
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionGiven a natural language question qNL over an RDF dataset D, an RDF Question/Answering (Q/A) system first translatesqNL into a SPARQL query graph Q and then evaluates Q over the underlying knowledge graph to figure out the answers Q(D). However, due to ...
Comments