Skip to main content
Log in

Top-k answers for XML keyword queries

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Searching XML data using keyword queries has attracted much attention because it enables Web users to easily access XML data without having to learn a structured query language or study possibly complex data schemas. Most of the current approaches identify the meaningful results of a given keyword query based on the semantics of lowest common ancestor (LCA) and its variants. However, given the fact that LCA candidates are usually numerous and of low relevance to the users’ information need, how to effectively and efficiently identify the most relevant results from a large number of LCA candidates is still a challenging and unresolved issue. In this article, we introduce a novel semantics of relevant results based on mutual information between the query keywords. Then, we introduce a novel approach for identifying the relevant answers of a given query by adopting skyline semantics. We also recommend three different ranking criteria for selecting the top-k relevant results of the query. Efficient algorithms are proposed which rely on some provable properties of the dominance relationship between result candidates to rapidly identify the top-k dominant results. Extensive experiments were conducted to evaluate our approach and the results show that the proposed approach has a good performance compared with other existing approaches in different data sets and evaluation metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Auction. http://monetdb.cwi.nl/xmark/auctions.xml

  2. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1999)

    Google Scholar 

  3. Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33(4), 1–49 (2008). doi:10.1145/1412331.1412343

    Article  Google Scholar 

  4. Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57, 75–94 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, pp. 421–430. IEEE Computer Society, Washington, DC, USA (2001)

  6. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of 19th International Conference on Data Engineering, vol. 717 (2003)

  7. Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in XML. In: CIKM ’05: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 389–396. ACM, New York, NY, USA (2005)

  8. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 45–56. VLDB Endowment (2003)

  9. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York, NY, USA (1991)

    Book  MATH  Google Scholar 

  10. DBLP. dblp.uni-trier.de/xml/dblp.xml

  11. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 102–113. ACM, New York, NY, USA (2001)

  12. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 16–27. ACM, New York, NY, USA (2003)

  13. Han, S.K., Shin, D., Jung, J.Y., Park, J.: Exploring the relationship between keywords and feed elements in blog post search. World Wide Web 12(4), 381–398 (2009)

    Article  Google Scholar 

  14. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18, 525–539 (2006)

    Article  Google Scholar 

  16. Kossmann, D., Ramsak, F., Rost, S.: Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 275–286. VLDB Endowment (2002)

  17. Kung, H.T., Luccio, F., Preparata, F.P.: On finding the maxima of a set of vectors. Journal of the ACM, pp. 469–476. ACM, New York, NY, USA (1975)

    Google Scholar 

  18. Lee, K.H., Whang, K.Y., Han, W.S.: Xmin: Minimizing tree pattern queries with minimality guarantee. World Wide Web 13(3), 343–371 (2010)

    Article  Google Scholar 

  19. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable LCAs over XML documents. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 31–40. ACM, New York, NY, USA (2007)

  20. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 903–914. ACM, New York, NY, USA (2008)

  21. Li, L., Otsuka, S., Kitsuregawa, M.: Finding related search engine queries by web community based query enrichment. World Wide Web 13(1), 121–142 (2010)

    Article  Google Scholar 

  22. Li, Y., Yu, C., Jagadish, H.V.: Enabling schema-free XQuery with meaningful query focus. VLDB J. 17(3), 355–377 (2008)

    Article  Google Scholar 

  23. Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 329–340. ACM, New York, NY, USA (2007)

  24. Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for XML keyword search. In: Proceedings of the 34th International Conference on Very Large Data Bases, pp. 921–932 (2008)

  25. Liu, Z., Chen, Y.: Processing keyword search on XML: a survey. World Wide Web 14, 671–707 (2011)

    Article  Google Scholar 

  26. Liu, Z., Walker, J., Chen, Y.: XSeek: a semantic XML search engine using keywords. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1330–1333. VLDB Endowment (2007)

  27. Mondial. http://www.cs.washington.edu/research/xmldatasets/data/mondial/mondial-3.0.xml

  28. Online computer library center. Introduction to the Dewey Decimal Classification. http://www.oclc.org/oclc/fp/about/abouttheddc.htm

  29. Oracle Berkeley DB. http://www.oracle.com/technology/products/berkeley-db/index.html.

  30. Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  31. Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual xml views. In: VLDB ’07: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1057–1068. VLDB Endowment (2007)

  32. Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1043–1052. ACM, New York, NY, USA (2007)

  33. Tan, K.L., Eng, P.K., Ooi, B.C.: Efficient progressive skyline computation. In: VLDB ’01: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 301–310 (2001)

  34. Wang, G., Ning, B., Yu, G.: Holistically stream-based processing xtwig queries. World Wide Web 11(4), 407–425 (2008)

    Article  Google Scholar 

  35. Wang, J., Yu, J.X., Liu, C.: Independence of containing patterns property and its application in tree pattern query rewriting using views. World Wide Web 12(1), 87–105 (2009)

    Article  Google Scholar 

  36. Wu, X., Theodoratos, D., Souldatos, S., Dalamagas, T., Sellis, T.: Evaluation techniques for generalized path pattern queries on xml data. World Wide Web 13(4), 441–474 (2010)

    Article  Google Scholar 

  37. Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 527–538. ACM, New York, NY, USA (2005)

  38. Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: Proceedings of the 11th International Conference on Extending Database Technology, pp. 535–546. ACM, New York, NY, USA (2008)

  39. XML Path Language (XPath). http://www.w3.org/tr/xpath/

  40. XQuery: An XML Query Language. http://www.w3.org/tr/xquery/

  41. Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: Proceedings of the 13th International Conference on Extending Database Technoloy, pp. 549–560. ACM, New York, NY, USA (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinli Cao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, K., Cao, J. Top-k answers for XML keyword queries. World Wide Web 15, 485–515 (2012). https://doi.org/10.1007/s11280-011-0145-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-011-0145-1

Keywords

Navigation