Skip to main content
Log in

Efficient query processing for XML keyword queries based on the IDList index

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword \(k\) consists of ordered nodes that directly or indirectly contain \(k\). We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. The matched element in \(L_{i}\) to eliminator \(e\) is the minimum element that is equal to or greater than \(e\), if all lists are sorted in ascending order.

  2. http://www.monetdb.org/Home.

  3. http://www.informatik.uni-trier.de/~ley/db/.

  4. http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html.

  5. In Fig. 18, we take the result selectivity of existing methods as that of our method to make a fair comparison.

References

  1. Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)

  2. Barbay, J., Lpez-Ortiz, A., Lu, T.: Faster adaptive set intersections for text searching. In: WEA, pp. 146–157 (2006)

  3. Bentley, J.L., Yao, A.C.-C.: An almost optimal algorithm for unbounded searching. Inf. Process. Lett. 5(3), 82–87 (1976)

    Google Scholar 

  4. Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE, pp. 689–700 (2010)

  5. Chen, Y., Wang, W., Liu, Z.: Keyword-based search and exploration on databases. In: ICDE, pp.1380–1383 (2011)

  6. Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: SIGMOD Conference, pp. 1005–1010 (2009)

  7. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: VLDB, pp. 45–56 (2003)

  8. Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: SODA, pp. 743–752 (2000)

  9. Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Experiments on adaptive set intersections for text retrieval systems. In: ALENEX, pp. 91–104 (2001)

  10. Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)

    Google Scholar 

  11. Fisher, D.K., Lam, F., Shui, W.M., Wong, R.K.: Efficient ordering for xml data. In: CIKM, pp. 350–357 (2003)

  12. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)

  13. Kong, L., Gilleron, R., Lemay, A.: Retrieving meaningful relaxed tightest fragments for xml keyword search. In: EDBT, pp. 815–826 (2009)

  14. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM, pp. 31–40 (2007)

  15. Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD Conference, pp. 695–706 (2009)

  16. Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–83 (2004)

  17. Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD Conference, pp. 329–340 (2007)

  18. Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1(1), 921–932 (2008)

    Google Scholar 

  19. Liu, Z., Chen, Y.: Processing keyword search on xml: a survey. World Wide Web 14(5–6), 671–707 (2011)

    Article  Google Scholar 

  20. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  21. Raman, V., Qiao, L., Han, W., Narang, I., Chen, Y.-L., Yang, K.-H., Ling, F.-L.: Lazy, adaptive rid-list intersection, and its application to index anding. In: SIGMOD Conference, pp. 773–784 (2007)

  22. Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW, pp. 1043–1052 (2007)

  23. Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered xml using a relational database system. In: SIGMOD Conference, pp. 204–215 (2002)

  24. Tsirogiannis, D., Guha, S., Koudas, N.: Improving the performance of list intersection. PVLDB 2(1), 838–849 (2009)

    Google Scholar 

  25. Wang, W., Wang, X., Zhou A.: Hash-search: an efficient slca-based keyword search algorithm on xml documents. In: DASFAA, pp. 496–510 (2009)

  26. Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD Conference, pp. 537–538 (2005)

  27. Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: EDBT, pp. 535–546 (2008)

  28. Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW, pp. 401–410 (2009)

  29. Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD Conference, pp. 425–436 (2001)

  30. Zhou, J., Bao, Z., Wang, W., Ling, T.W., Chen, Z., Lin, X., Guo, J.: Fast slca and elca computation for xml keyword queries based on set intersection. In: ICDE, pp. 905–916 (2012)

  31. Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: EDBT, pp. 549–560 (2010)

Download references

Acknowledgments

This research was partially supported by the grants from the Natural Science Foundation of China (No. 61073060, 60833005, 61070055, 91024032, 91124001), the National Science and Technology Major Project (No. 2010-ZX01042-002-003), the Fundamental Research Funds for the Central Univ., the Research Funds of Renmin Univ. (No. 11XNL010, 10XNI018), and the Research Funds from Education Department of Hebei Province (No. Y2012014). Zhifeng Bao’s research is carried out at the SeSaMe Centre. It is supported by the Singapore NRF under its IRC@SG Funding Initiative and administered by the IDMPO. Wei Wang was partially supported by ARC DP130103401 and DP130103405.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfeng Zhou.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 422 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, J., Bao, Z., Wang, W. et al. Efficient query processing for XML keyword queries based on the IDList index. The VLDB Journal 23, 25–50 (2014). https://doi.org/10.1007/s00778-013-0313-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-013-0313-2

Keywords

Navigation