Skip to main content
Log in

Keyword Query over Error-Tolerant Knowledge Bases

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users’ satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users’ requirement better compared with the traditional r-clique definition, and our algorithms are efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kasneci G, Suchanek F M, Ifrim G, Ramanath M, Weikum G. NAGA: Searching and ranking knowledge. In Proc. the 24th International Conference on Data Engineering (ICDE), Apr. 2008, pp.953-962.

  2. Wang H, Aggarwal C C. A survey of algorithms for keyword search on graph data. In Managing and Mining Graph Data, Aggarwal C C, Wang H (eds.), Springer, 2010, pp.249-273.

  3. Yu J X, Qin L, Chang L. Keyword Search in Databases. Morgan and Claypool Publishers, 2009.

  4. Yang M, Ding B, Chaudhuri S, Chakrabarti K. Finding patterns in a knowledge base using keywords to compose table answers. Proceedings of the VLDB Endowment, 2014, 7(14): 1809-1820.

    Article  Google Scholar 

  5. Kargar M, An A. Keyword search in graphs: Finding r-cliques. Proceedings of the VLDB Endowment, 2011, 4(10): 681-692.

    Article  Google Scholar 

  6. Li G, Ooi B C, Feng J, Wang J, Zhou L. EASE: An effective 3-in-1 keyword search method for unstructured, semistructured and structured data. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2008, pp.903-914.

  7. Lian X, Chen L, Huang Z. Keyword search over probabilistic RDF graphs. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5): 1246-1260.

    Article  Google Scholar 

  8. Zou L, Huang R, Wang H, Yu J X, He W, Zhao D. Natural language question answering over RDF: A graph data driven approach. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2014, pp.313-324.

  9. Zheng W, Zou L, Lian X, Yu J X, Song S, Zhao D. How to build templates for RDF question/answering — An uncertain graph similarity join approach. In Proc. ACM SIGMOD International Conference on Management of Data, May 2015, pp.1809-1824.

  10. Zhang L, Tran T, Rettinger A. Probabilistic query rewriting for efficient and effective keyword search on graph data. Proceedings of the VLDB Endowment, 2013, 6(14): 1642-1653.

    Article  Google Scholar 

  11. Hristidis V, Gravano L, Papakonstantinou Y. Efficient IR-style keyword search over relational databases. In Proc. the 29th VLDB, Sept. 2003, pp.850-861.

  12. Liu F, Yu C, Meng W, Chowdhury A. Effective keyword search in relational databases. In Proc. the 2006 ACM SIGMOD International Conference on Management of Data, Jun. 2006, pp.563-574.

  13. Luo Y, Lin X, Wang W, Zhou X. Spark: Top-k keyword query in relational databases. In Proc. the 27th ACMSIGMOD International Conference on Management of Data, Jun. 2007, pp.115-126.

  14. Cohen S, Mamou J, Kanza Y, Sagiv Y. XSearch: A semantic search engine for XML. In Proc. the 29th VLDB, Sept. 2003, pp.45-56.

  15. Hristidis V, Koudas N, Papakonstantinou Y, Srivastava D. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(4):525-539.

    Article  Google Scholar 

  16. Liu Z, Chen Y. Identifying meaningful return information for XML keyword search. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2007, pp.329-340.

  17. Xu Y, Papakonstantinou Y. Efficient keyword search for smallest LCAs in XML databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2005, pp.527-538.

  18. Tong Y, Zhang X, Cao C C, Chen L. Efficient probabilistic supergraph search over large uncertain graphs. In Proc. the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM), Nov. 2014, pp.809-818.

  19. Tong Y, She J, Meng R. Bottleneck-aware arrangement over event-based social networks: The max-min approach. World Wide Web, 2015.

  20. Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S. Keyword searching and browsing in databases using banks. In Proc. the 18th International Conference on Data Engineering (ICDE), Feb.26-Mar.1, 2002, pp.431-440.

  21. Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H. Bidirectional expansion for keyword search on graph databases. In Proc. the 31st VLDB, Aug. 2005, pp.505-516.

  22. Yuan Y, Wang G, Chen L, Wang H. Efficient keyword search on uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(12): 2767-2779.

    Article  Google Scholar 

  23. Wu Y, Yang S, Srivatsa M, Iyengar A, Yan X. Summarizing answer graphs induced by keyword queries. Proceedings of the VLDB Endowment, 2013, 6(14): 1774-1785.

    Article  Google Scholar 

  24. Yuan Y, Chen L, Wang G. Efficiently answering probability thresholdbased shortest path queries over uncertain graphs. In Proc. the 15th DASFAA, Apr. 2010, pp.155-170.

  25. Balas E, Xue J. Weighted and unweighted maximum clique algorithms with upper bounds from fractional coloring. Algorithmica, 1996, 15(5): 397-412.

    Article  MathSciNet  MATH  Google Scholar 

  26. Suchanek F M, Ifrim G, Weikum G. Combining linguistic and statistical analysis to extract relations from web documents. In Proc. the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2006, pp.712-717.

  27. Chang L, Yu J X, Qin L, Lin X, Liu C, Liang W. Efficiently computing k-edge connected components via graph decomposition. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2013, pp.205-216.

  28. Vazirani V V. Approximation Algorithms. Springer Berlin Heidelberg, 2003.

  29. Mitzenmacher M, Upfal E. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005.

  30. Yang S, Wu Y, Sun H, Yan X. Schemaless and structureless graph querying. Proceedings of the VLDB Endowment, 2014, 7(7): 565-576.

    Article  Google Scholar 

  31. Jin R, Liu L, Ding B, Wang H. Distance-constraint reachability computation in uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(9): 551-562.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Yuan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, YR., Yuan, Y., Li, JY. et al. Keyword Query over Error-Tolerant Knowledge Bases. J. Comput. Sci. Technol. 31, 702–719 (2016). https://doi.org/10.1007/s11390-016-1658-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-016-1658-y

Keywords

Navigation