A query refinement framework for xml keyword search

Bao, Zhifeng; Yu, Yi; Shen, Jian; Fu, Zhangjie

doi:10.1007/s11280-017-0447-z

A query refinement framework for xml keyword search

Published: 15 March 2017

Volume 20, pages 1469–1505, (2017)
Cite this article

World Wide Web Aims and scope Submit manuscript

Zhifeng Bao¹,
Yi Yu²,
Jian Shen³ &
…
Zhangjie Fu³

427 Accesses
4 Citations
Explore all metrics

Abstract

Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structural-Based Relevance Feedback in XML Retrieval

From Structure-Based to Semantics-Based: Towards Effective XML Keyword Search

Exploiting the Relationship between Keywords for Effective XML Keyword Search

Notes

Basically, it considers a node type t in the DTD of XML data as an entity if t is “*”-annotated in its DTD. However, it may cause the multi-valued attribute to be mistakenly identified as an entity, thus it usually requires the verification and decision from database administrators.
Without ambiguity caused, we use “refinement rule” instead of “refinement rule instance” in the rest of the paper.
To facilitate our discussion, the dissimilarity score of a single term deletion rule is 2 throughout all examples in this paper.
http://www.ibiblio.org/xml/books/biblegold/examples/baseball/
The url is anonymized due to double blind review policy
To facilitate the discussion, we call our refinement approach as XRefine in the rest of the paper.

References

Berkeley DB. http://www.sleepycat.com/
INitiative for the Evaluation of XML Retrieval. http://inex.is.informatik.uni-duisburg.de/
XML for advertising. http://xml.coverpages.org/adXML.html
Agrawal, R., Imieliński, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: SIGMOD (1993)
Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML Keyword Search with Relevance Oriented Ranking. In: ICDE (2009)
Bao, Z., Lu, J., Ling, T.W., Xu, L., Wu, H.: An Effective Object-Level XML Keyword Search. In: Database Systems for Advanced Applications, 15Th International Conference, DASFAA 2010, pp. 93–109 (2010)
Bao, Z., Zeng, Y., Ling, T.W., Zhang, D., Li, G., Jagadish, H.V.: A general framework to resolve the mismatch problem in XML keyword search. VLDB J. 24(4), 493–518 (2015). doi:10.1007/s00778-015-0386-1
Article Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1-7), 107–117 (1998)
Google Scholar
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: VLDB (2003)
Fain, D.C., Pedersen, J.O.: Sponsored Search. In: Bulletin of the American Society for Information Science and Technology (2005)
Fellbaum, C.: Wordnet: an electronic lexical database
Feng, J., Li, G.: Efficient fuzzy type-ahead search in XML data. IEEE Trans. Knowl. Data Eng. 24(5), 882–895 (2012)
Article Google Scholar
Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. 27(9), 2546–2559 (2016)
Article Google Scholar
Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: Multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. 98-B(1), 190–200 (2015)
Article Google Scholar
Fu, Z., Wu, X., Guan, C., Sun, X., Ren, K.: Towards efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security. doi:10.1109/TIFS.2016.2596138 (2016)
Guo, J., Xu, G., Li, H., Cheng, X.: A Unified and Discriminative Model for Query Refinement. In: SIGIR (2008)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD (2003)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. In: TKDE (2006)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: ICDE (2003)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4) (2002)
Jones, R., Fain, D.: Query word deletion prediction. In: SIGIR03
Jones, R., Rey, B., Madani, O., Greiner, W.: Generating Query Substitutions. In: WWW (2006)
Ley, M. http://www.informatik.uni-trier.de/ley/db/
Li, G., Feng, J., Wang, J., Zhou, L.: Effective Keyword Search for Valuable Lcas over Xml Documents. In: CIKM, pp. 31–40 (2007)
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: Efficient and Adaptive Keyword Search on Unstructured, Semi-Structured and Structured Data. In: SIGMOD (2008)
Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of Promising Result Types for XML Keyword Search. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 561–572 (2010)
Li, J., Liu, C., Zhou, R., Wang, W.: Top-K Keyword Search over Probabilistic XML Data. In: Proceedings of the 27Th International Conference on Data Engineering, ICDE 2011, pp. 673–684 (2011)
Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring Distributional Similarity Based Models for Query Spelling Correction. In: ACL, pp. 1025–1032 (2006)
Li, Y., Yu, C., Jagadish, H.: Schema-Free XQuery. In: VLDB (2004)
Liu, Z., Chen, Y.: Identifying Meaningful Return Information for Xml Keyword Search. In: SIGMOD (2007)
Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search PVLDB 1(1) (2008)
Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation PVLDB (2009)
Lu, Y., Wang, W., Li, J., Liu, C.: Xclean: Providing Valid Spelling Suggestions for xml Keyword Queries. In: ICDE (2011)
Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for xml retrieval. In: INEX (2004)
Pan, H., Theobald, A., Schenkel, R.: Query refinement by relevance feedback in an xml retrieval system. In: ER (2004)
Peng, F., Ahmed, N., Li, X., Lu, Y., Lu, Y.: Context sensitive stemming for Web search. In: SIGIR (2007)
Petkova, D., Croft, W.B., Diao, Y.: Refining keyword queries for xml retrieval by combining content and structure. In: ECIR (2009)
Pu, K.Q., Yu, X.: Keyword uery cleaning. In: VLDB (2008)
Qiu, Y., Frei, H.P.: Concept based query expansion. In: SIGIR, pp. 160–169 (1993)
Risvik, K.M., Mikolajewski, T., Boros, P., Boros, P.: Query Segmentation for Web Search. In: WWW (2003)
Ruthven, I.: Re-Examining the Potential Effectiveness of Interactive Query Expansion. In: SIGIR (2003)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc (1986)
Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3) (2002)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway Slca-Based Keyword Search in xml Data. In: WWW (2007)
Tao, Y., Papadopoulos, S., Sheng, C., Stefanidis, K., Stefanidis, K.: Nearest Keyword Search in Xml Documents. In: SIGMOD (2011)
Termehchy, A., Winslett, M., Winslett, M.: Using structural information in xml keyword search effectively. ACM Trans. Database Syst. (2011)
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: Topx: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1) (2008)
Vlez, B., Weiss, R., Sheldon, M.A., Gifford, D.K., Gifford, D.K.: Fast and Effective Query Refinement. In: SIGIR (1997)
Wu, H., Bao, Z.: Object-Oriented XML Keyword Search. In: Conceptual Modeling - ER 2011, 30Th International Conference, 1 2011, pp. 402–410 (2011)
Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2016)
Article Google Scholar
Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. (2000)
Xu, J., Croft, W.B., Croft, W.B.: Query Expansion Using Local and Global Document Analysis. In: SIGIR (1996)
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD (2005)
Xu, Y., Papakonstantinou, Y.: Efficient Lca Based Keyword Search in xml Data. In: EDBT (2008)
Zeng, Y., Bao, Z., Ling, T.W., Jagadish, H.V., Li, G.: Breaking out of the Mismatch Trap. In: IEEE 30Th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pp. 940–951 (2014). doi:10.1109/ICDE.2014.6816713
Zhou, R., Liu, C., Li, J.: Fast ELCA Computation for Keyword Queries on XML Data. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 549–560 (2010)
Zhou, R., Liu, C., Li, J., Yu, J.X.: ELCAx Evaluation for keyword search on probabilistic XML data. World Wide Web 16(2), 171–193 (2013)
Article Google Scholar

Download references

Acknowledgments

This work is partially supported by the Australian Research Council’s Discovery Projects Scheme (DP170102726), the National Natural Foundation of China under Grant No. 91646204 and the JSPS KAKENHI Grant No. 16K16058.

Author information

Authors and Affiliations

Computer Science and IT, RMIT University, Melbourne, Australia
Zhifeng Bao
National Institute of Informatics, Tokyo, Japan
Yi Yu
School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China
Jian Shen & Zhangjie Fu

Authors

Zhifeng Bao
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zhangjie Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhifeng Bao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bao, Z., Yu, Y., Shen, J. et al. A query refinement framework for xml keyword search. World Wide Web 20, 1469–1505 (2017). https://doi.org/10.1007/s11280-017-0447-z

Download citation

Received: 17 September 2016
Revised: 19 February 2017
Accepted: 23 February 2017
Published: 15 March 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11280-017-0447-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A query refinement framework for xml keyword search

Abstract

Access this article

Similar content being viewed by others

Structural-Based Relevance Feedback in XML Retrieval

From Structure-Based to Semantics-Based: Towards Effective XML Keyword Search

Exploiting the Relationship between Keywords for Effective XML Keyword Search

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A query refinement framework for xml keyword search

Abstract

Access this article

Similar content being viewed by others

Structural-Based Relevance Feedback in XML Retrieval

From Structure-Based to Semantics-Based: Towards Effective XML Keyword Search

Exploiting the Relationship between Keywords for Effective XML Keyword Search

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation