Skip to main content
Log in

Efficient processing of top-k twig queries over probabilistic XML data

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., Senellart, P.: Queryig and updating probabilistic information in XML. In: Prodeeding of EDBT, pp. 1059–1068 (2006)

  2. Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of SIGMOD, pp. 310–321 (2002)

  3. Chang, L., Yu, J.X., Qin, L.: Query ranking in probabilistic XML data. In: Proceeding of EDBT, pp. 156–167 (2009)

  4. Diaz, A., Lovell, D.: XML Generator. http://www.alphaworks.ibm.com/tech/xmlgenerator/. Accessed Sept 1999

  5. Grust, T.: Accelerating XPath location steps. In: Proceeding of SIGMOD, pp. 109–120 (2002)

  6. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceeding of SIGMOD, pp. 673–686 (2008)

  7. Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently answering probabilistic threshold top-k queries on uncertain data. In: Proceeding of ICDE, pp. 1403–1405 (2008)

  8. Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval XML. In: Proceeding of ICDT, pp. 358–374 (2003)

  9. Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: a probabilistic semistructured data model and algebra. In: Proceeding of ICDE, pp. 467–478 (2003)

  10. Kimelfeld, B., Kosharovsky, Y., Sagiv, Y.: Query efficiency in probabilistic XML models. In: Proceeding of SIGMOD, pp. 701–714 (2008)

  11. Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic XML. In: Proceeding of VLDB, pp. 27–38. (2007)

  12. Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: Proceedings of ICDE, pp. 673–684 (2011)

  13. Liu, C., Li, J., Yu, J.X., Zhou R.: Adaptive relaxation for querying heterogeneous XML data sources. Inf. Syst. 35(6), 688–707 (2010)

    Article  Google Scholar 

  14. Liu, C., Vincent, M.W., Liu, J.: Constraint preserving transformation from relational schema to XML schema. World Wide Web 9(1), 93–110 (2006)

    Article  MathSciNet  Google Scholar 

  15. Lu, J., Ling, T.W., Chan, C.Y., Chen, T.: From region encoding to extended dewey. On efficient processing of XML twig pattern matching. In: Proceeding of VLDB, pp. 193–204 (2005)

  16. Nierman, A., Jagasish, H.V.: ProTDB: probabilistic data in XML. In: Proceeding of VLDB, pp. 646–657 (2002)

  17. Qin, L., Yu, J.X., Ding, B.: TwigList: make twig pattern matching fast. In: Proceeding of DASFAA, pp. 850–862 (2007)

  18. Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic XML data. In: roceeding of PODS, pp. 283–292 (2007)

  19. University of Washington XML Repository. http://www.cs.washington.edu/research/xmldatasets/. Accessed Oct 2002

  20. Wang, G., Ning, B., Yu, G.: Holistically stream-based processing Xtwig queries. World Wide Web 11(4), 407–425 (2008)

    Article  Google Scholar 

  21. Wu, X., Theodoratos, D., Souldatos, S., Dalamagas, T., Sellis, T.: Evaluation techniques for generalized path pattern queries on XML data. World Wide Web 13(4), 441–474 (2010)

    Article  Google Scholar 

  22. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: Proceeding of ICDE, pp. 1406–1408 (2008)

  23. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases with x-relations. TKDE 20(12), 1669–1682 (2008)

    Google Scholar 

  24. Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proceeding of SIGMOD, pp. 425–436 (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Ning.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ning, B., Liu, C. & Yu, J.X. Efficient processing of top-k twig queries over probabilistic XML data. World Wide Web 16, 299–323 (2013). https://doi.org/10.1007/s11280-011-0144-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-011-0144-2

Keywords

Navigation