Abstract
Keyword search is a user-friendly approach for users to retrieve information from XML data. Since an XML document can have a large size and contain a lot of information, an XML keyword search result should be a fragment of an XML document dynamically constructed at query time, which is achievable due to the structuredness of XML. Processing keyword searches on XML has several challenges, e.g., what are the elements in the XML document that are relevant to the query? How to generate the results efficiently and rank the results meaningfully? How to present the results to the user in a way such that the user can quickly find the desired information? In this survey, we review the papers in the literature that attempted to address these problems. We divide the existing approaches into several classes based on the problem they tackled, and perform a comprehensive analysis of these works.
Similar content being viewed by others
References
Ashoori, E., Lalmas, M.: Using topic shifts for focussed access to XML repositories. In: ECIR, pp. 444–455 (2007)
Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML keyword search with relevance oriented ranking. In: ICDE, pp. 517–528 (2009)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431–440 (2002)
Blanke, T., Lalmas, M.: Specificity aboutness in XML retrieval. In: ICTIR, pp. 176–187 (2009)
Bohannon, P., Freire, J., Roy, P., Siméon, J.: From XML schema to relations: a cost-based approach to XML storage. In: ICDE, pp. 64–75 (2002)
Braga, D., Campi, A.: XQBE: a graphical environment to query XML data. World Wide Web 8(3), 287–316 (2005)
Chen, L.J., Papakonstantinou, Y.: Supporting top-K keyword search in XML databases. In: ICDE, pp. 689–700 (2010)
Chen, Y., Mihaila, G.A., Bordawekar, R., Padmanabhan, S.: L-Tree: a dynamic labeling structure for ordered XML data. In: EDBT Workshops, pp. 209–218 (2004)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: VLDB, pp. 45–56 (2003)
Dalvi, B.B., Kshirsagar, M., Sudarshan, S.: Keyword search on external memory data graphs. Proc. VLDB Endow. 1(1), 1189–1204 (2008)
Deutsch, A., Fernández, M.F., Suciu, D.: Storing semistructured data with STORED. In: SIGMOD Conference, pp. 431–442 (1999)
Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)
Golenberg, K., Kimelfeld, B., Sagiv, Y.: Keyword proximity search in complex data graphs. In: SIGMOD Conference, pp. 927–940 (2008)
Gövert, N., Fuhr, N., Lalmas, M., Kazai, G.: Evaluating the effectiveness of content-oriented XML retrieval methods. Inf. Retr. 9(6), 699–722 (2006)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD Conference, pp. 16–27 (2003)
Hansen, P., Roberts, F.S.: An impossibility result in axiomatic location theory. Math. Oper. Res. 21(1), 195–208 (1996)
He, H., Wang, H., Yang, J., Yu, P.S.: BLINKS: ranked keyword searches on graphs. In: SIGMOD Conference, pp. 305–316 (2007)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: ICDE, pp. 367–378 (2003)
Huang, Y., Liu, Z., Chen, Y.: eXtract: a snippet generation system for XML search. PVLDB 1(2), 1392–1395 (2008)
Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in XML search. In: SIGMOD Conference, pp. 315–326 (2008)
INEX: INitiative for the Evaluation of XML Retrieval
Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB, pp. 505–516 (2005)
Kazai, G., Lalmas, M.: eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst. 24(4), 503–542 (2006)
Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)
Kleinberg, J.M.: An impossibility theorem for clustering. In: NIPS, pp. 446–453 (2002)
Koutrika, G., Simitsis, A., Ioannidis, Y.E.: Précis: the essence of a query answer. In: ICDE, pp. 69–78 (2006)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable LCAs over XML documents. In: CIKM, pp. 31–40 (2007)
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: SIGMOD Conference, pp. 903–914 (2008)
Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of promising result types for XML keyword search. In: EDBT, pp. 561–572 (2010)
Li, Y., Yu, C., Jagadish, H.V.: Schema-free XQuery. In: VLDB, pp. 72–83 (2004)
Liu, Z., Cai, Y., Chen, Y.: TargetSearch: a ranking friendly XML keyword search engine. In: ICDE, pp. 1101–1104 (2010)
Liu, Z., Chen, Y.: Identifying meaningful return information for XML keyword search. In: SIGMOD Conference, pp. 329–340 (2007)
Liu, Z., Chen, Y.: Answering keyword queries on XML using materialized views. In: ICDE, pp. 1501–1503 (2008)
Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for XML keyword search. PVLDB 1(1), 921–932 (2008)
Liu, Z., Chen, Y.: Return specification inference and result clustering for keyword search on XML. ACM Trans. Database Syst. 35(2), 10:1–10:47 (2010)
Liu, Z., Huang, Y., Chen, Y.: Improving XML search by generating and utilizing informative result snippets. ACM Trans. Database Syst. 35(3), 19:1–19:45 (2010)
Liu, Z., Walker, J., Chen, Y.: XSeek: a semantic XML search engine using keywords. In: VLDB, pp. 1330–1333 (2007)
Manolescu, I., Florescu, D., Kossmann, D., Xhumari, F., Olteanu, D.: Agora: living with XML and relational. In: VLDB, pp. 623–626 (2000)
Ning, X., Jin, H., Jia, W., Yuan, P.: Practical and effective IR-style keyword search over semantic web. Inf. Process. Manag. 45(2), 263–271 (2009)
O’Neil, P.E., O’Neil, E.J., Pal, S., Cseri, I., Schaller, G., Westbury, N.: ORDPATHs: insert-friendly XML node labels. In: SIGMOD Conference, pp. 903–908 (2004)
Osborne, M.J., Rubinstein, A.: A Course in Game Theory. MIT Press (1994)
Pennock, D.M., Horvitz, E., Giles, C.L.: Social choice theory and recommender systems: analysis of the axiomatic foundations of collaborative filtering. In: AAAI/IAAI, pp. 729–734 (2000)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)
Schenkel, R., Theobald, M.: Structural feedback for keyword-based XML retrieval. In: ECIR, pp. 326–337 (2006)
Schmidt, A., Kersten, M.L., Windhouwer, M.: Querying XML documents made easy: nearest concept queries. In: ICDE, pp. 321–329 (2001)
Schmidt, A., Kersten, M.L., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: WebDB (Selected Papers), pp. 137–150 (2000)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway SLCA-based keyword search in XML data. In: WWW, pp. 1043–1052 (2007)
Tran, T., Cimiano, P., Rudolph, S., Studer, R.: Ontology-based interpretation of keywords for semantic search. In: ISWC/ASWC, pp. 523–536 (2007)
Tran, T., Wang, H., Rudolph, S., Cimiano, P.: Top-k exploration of query candidates for efficient keyword search on graph-shaped (RDF) data. In: ICDE, pp. 405–416 (2009)
Wu, X., Lee, M.-L., Hsu, W.: A prime number labeling scheme for dynamic ordered XML trees. In: ICDE, pp. 66–78 (2004)
Xu, L., Ling, T.W., Wu, H., Bao, Z.: DDE: from Dewey to a fully dynamic XML labeling scheme. In: SIGMOD Conference, pp. 719–730 (2009)
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest LCAs in XML databases. In: SIGMOD Conference, pp. 537–538 (2005)
Xu, Y., Papakonstantinou, Y.: Efficient LCA based keyword search in XML data. In: EDBT, pp. 535–546 (2008)
Yu, J.X., Luo, D., Meng, X., Lu, H.: Dynamically updating XML data: numbering scheme revisited. World Wide Web 8(1), 5–26 (2005)
Zheng, S., Zhou, A., Zhang, L., Lu, H.: DVQ: towards visual query processing of XML database systems. World Wide Web 6(2), 233–253 (2003)
Zhou, Q., Wang, C., Xiong, M., Wang, H., Yu, Y.: Spark: adapting keyword query to semantic search. In: ISWC/ASWC, pp. 694–707 (2007)
Zhou, R., Liu, C., Li, J.: Fast ELCA computation for keyword queries on XML data. In: EDBT, pp. 549–560 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, Z., Chen, Y. Processing keyword search on XML: a survey. World Wide Web 14, 671–707 (2011). https://doi.org/10.1007/s11280-011-0128-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0128-2