Abstract
We in this paper investigate keyword search over data-centric XML documents. We first present a novel method to divide an XML document into self-integrated subtrees, which are connected subtrees and can capture different structural information of the XML document. We then propose the meaningful self-integrated trees, which contain all the keywords and describe how the keywords are interrelated, to answer keyword search over XML documents. In addition, we introduce the B + -tree index to accelerate the retrieval of those meaningful self-integrated trees. Moreover, to further enhance the performance of keyword search, we present Bloom Filter to improve the efficiency of generating those meaningful self-integrated trees. Finally, we conducted extensive experiments to evaluate the performance of our method, and the experimental results demonstrate that our method achieves high efficiency and outperforms the existing approaches significantly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)
Amer-Yahia, S., Botev, C., Dorre, J., Shanmugasundaram, J.: Xquery full-text extensions explained. IBM Systems Journal 45(2), 335–352 (2006)
Amer-Yahia, S., Curtmola, E., Deutsch, A.: Flexible and efficient xml search with complex full-text predicates. In: SIGMOD, pp. 575–586 (2006)
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for xml. In: VLDB, pp. 361–372 (2005)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using banks. In: ICDE, pp. 431–440 (2002)
Bloom, B.: Space/time trade-offs in hash coding with allowable errors. Communications of ACM 13(7), 422–426 (1970)
Botev, C., Amer-Yahia, S., Shanmugasundaram, J.: Expressiveness and Performance of Full-Text Search Languages. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 349–367. Springer, Heidelberg (2006)
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)
Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in xml. In: CIKM, pp. 389–396 (2005)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: A semantic search engine for xml. In: VLDB, pp. 45–56 (2003)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. In: SIGMOD, pp. 16–27 (2003)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in xml trees. IEEE Trans. Knowl. Data Eng. 18(4) (2006)
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on xml graphs. In: ICDE, pp. 367–378 (2003)
Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)
Li, Y., Yang, H., Jagadish, H.V.: Nalix: an interactive natural language interface for querying xml. In: SIGMOD, pp. 900–902 (2005)
Li, Y., Yang, H., Jagadish, H.V.: Constructing a Generic Natural Language Interface for an XML Database. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 737–754. Springer, Heidelberg (2006)
Li, Y., Yu, C., Jagadish, H.V.: Schema-free xquery. In: VLDB, pp. 72–84 (2004)
Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD, pp. 563–574 (2006)
Marais, J., Bharat, K.: Supporting cooperative and personal surfing with a desktop assistant. In: ACM UIST, ACM Press, New York (1997)
Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)
Pradhan, S.: An algebraic query model for effective and efficient retrieval of xml fragments. In: VLDB, pp. 295–306 (2006)
Schieber, B., Vishkin, U.: On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: SIGMOD, pp. 527–538 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Li, G., Feng, J., Ta, N., Zhou, L. (2007). Efficient Keyword Search over Data-Centric XML Documents. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds) Advances in Data and Web Management. APWeb WAIM 2007 2007. Lecture Notes in Computer Science, vol 4505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72524-4_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-72524-4_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72483-4
Online ISBN: 978-3-540-72524-4
eBook Packages: Computer ScienceComputer Science (R0)