Efficient query processing for XML keyword queries based on the IDList index

Zhou, Junfeng; Bao, Zhifeng; Wang, Wei; Zhao, Jinjia; Meng, Xiaofeng

doi:10.1007/s00778-013-0313-2

Efficient query processing for XML keyword queries based on the IDList index

Regular Paper
Published: 01 May 2013

Volume 23, pages 25–50, (2014)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Junfeng Zhou¹,
Zhifeng Bao²,
Wei Wang³,
Jinjia Zhao¹ &
…
Xiaofeng Meng⁴

1831 Accesses
15 Citations
Explore all metrics

Abstract

Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the common-ancestor-repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword \(k\) consists of ordered nodes that directly or indirectly contain \(k\). We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes. We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Schema-Independence in XML Keyword Search

No-but-semantic-match: computing semantically matched xml keyword search results

Article 13 October 2017

Efficient XML Keyword Search Based on DAG-Compression

Notes

The matched element in \(L_{i}\) to eliminator \(e\) is the minimum element that is equal to or greater than \(e\), if all lists are sorted in ascending order.
http://www.monetdb.org/Home.
http://www.informatik.uni-trier.de/~ley/db/.
http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html.
In Fig. 18, we take the result selectivity of existing methods as that of our method to make a fair comparison.

References

Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)
Barbay, J., Lpez-Ortiz, A., Lu, T.: Faster adaptive set intersections for text searching. In: WEA, pp. 146–157 (2006)
Bentley, J.L., Yao, A.C.-C.: An almost optimal algorithm for unbounded searching. Inf. Process. Lett. 5(3), 82–87 (1976)
Google Scholar
Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE, pp. 689–700 (2010)
Chen, Y., Wang, W., Liu, Z.: Keyword-based search and exploration on databases. In: ICDE, pp.1380–1383 (2011)
Chen, Y., Wang, W., Liu, Z., Lin, X.: Keyword search on structured and semi-structured data. In: SIGMOD Conference, pp. 1005–1010 (2009)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: VLDB, pp. 45–56 (2003)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Adaptive set intersections, unions, and differences. In: SODA, pp. 743–752 (2000)
Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Experiments on adaptive set intersections for text retrieval systems. In: ALENEX, pp. 91–104 (2001)
Ding, B., König, A.C.: Fast set intersection in memory. PVLDB 4(4), 255–266 (2011)
Google Scholar
Fisher, D.K., Lam, F., Shui, W.M., Wong, R.K.: Efficient ordering for xml data. In: CIKM, pp. 350–357 (2003)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)
Kong, L., Gilleron, R., Lemay, A.: Retrieving meaningful relaxed tightest fragments for xml keyword search. In: EDBT, pp. 815–826 (2009)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: CIKM, pp. 31–40 (2007)
Li, G., Ji, S., Li, C., Feng, J.: Efficient type-ahead search on relational data: a tastier approach. In: SIGMOD Conference, pp. 695–706 (2009)
Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–83 (2004)
Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: SIGMOD Conference, pp. 329–340 (2007)
Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search. PVLDB 1(1), 921–932 (2008)
Google Scholar
Liu, Z., Chen, Y.: Processing keyword search on xml: a survey. World Wide Web 14(5–6), 671–707 (2011)
Article Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Raman, V., Qiao, L., Han, W., Narang, I., Chen, Y.-L., Yang, K.-H., Ling, F.-L.: Lazy, adaptive rid-list intersection, and its application to index anding. In: SIGMOD Conference, pp. 773–784 (2007)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: WWW, pp. 1043–1052 (2007)
Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered xml using a relational database system. In: SIGMOD Conference, pp. 204–215 (2002)
Tsirogiannis, D., Guha, S., Koudas, N.: Improving the performance of list intersection. PVLDB 2(1), 838–849 (2009)
Google Scholar
Wang, W., Wang, X., Zhou A.: Hash-search: an efficient slca-based keyword search algorithm on xml documents. In: DASFAA, pp. 496–510 (2009)
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD Conference, pp. 537–538 (2005)
Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: EDBT, pp. 535–546 (2008)
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: WWW, pp. 401–410 (2009)
Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: SIGMOD Conference, pp. 425–436 (2001)
Zhou, J., Bao, Z., Wang, W., Ling, T.W., Chen, Z., Lin, X., Guo, J.: Fast slca and elca computation for xml keyword queries based on set intersection. In: ICDE, pp. 905–916 (2012)
Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: EDBT, pp. 549–560 (2010)

Download references

Acknowledgments

This research was partially supported by the grants from the Natural Science Foundation of China (No. 61073060, 60833005, 61070055, 91024032, 91124001), the National Science and Technology Major Project (No. 2010-ZX01042-002-003), the Fundamental Research Funds for the Central Univ., the Research Funds of Renmin Univ. (No. 11XNL010, 10XNI018), and the Research Funds from Education Department of Hebei Province (No. Y2012014). Zhifeng Bao’s research is carried out at the SeSaMe Centre. It is supported by the Singapore NRF under its IRC@SG Funding Initiative and administered by the IDMPO. Wei Wang was partially supported by ARC DP130103401 and DP130103405.

Author information

Authors and Affiliations

The Key Laboratory for Computer Virtual Technology and System Integration of HeBei Province, School of Information Science and Engineering, Yanshan University, Qinhuangdao, China
Junfeng Zhou & Jinjia Zhao
Interactive Digital Media Institute, Singapore, Singapore
Zhifeng Bao
The University of New South Wales, Kensington, NSW, Australia
Wei Wang
Renmin University of China, Beijing, China
Xiaofeng Meng

Authors

Junfeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinjia Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Meng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junfeng Zhou.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 422 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, J., Bao, Z., Wang, W. et al. Efficient query processing for XML keyword queries based on the IDList index. The VLDB Journal 23, 25–50 (2014). https://doi.org/10.1007/s00778-013-0313-2

Download citation

Received: 12 September 2012
Revised: 24 March 2013
Accepted: 29 March 2013
Published: 01 May 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00778-013-0313-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient query processing for XML keyword queries based on the IDList index

Abstract

Access this article

Similar content being viewed by others

Schema-Independence in XML Keyword Search

No-but-semantic-match: computing semantically matched xml keyword search results

Efficient XML Keyword Search Based on DAG-Compression

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 422 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient query processing for XML keyword queries based on the IDList index

Abstract

Access this article

Similar content being viewed by others

Schema-Independence in XML Keyword Search

No-but-semantic-match: computing semantically matched xml keyword search results

Efficient XML Keyword Search Based on DAG-Compression

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 422 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation