Skip to main content
Log in

Answering XML Queries Using Path-Based Indexes: A Survey

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The problem of answering XML queries using path-based indexes is to find efficient methods for accelerating the XML query with pre-designed index structures over the XML database. This problem received increasing interests and have been lucubrated in recent years. Regular path expression is the core of the XML query languages e.g., XPath and XQuery. Most of the state-of-the-art path-based XML indexes, therefore, hammer at how to efficiently answer the path-based XML queries. This paper surveys various approaches to indexing XML data proposed in the literature. We give a step by step analysis to show the evolution of index structures for XML path information, based on tree structures or more commonly, directed labeled graphs. For each approach, we first present the specific issue it aims to tackle, and then the proposed solution presented. Furthermore, construction, physical data storage and maintenance costs, are analyzed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul, S.: Querying semi-structured data. In: Proceedings of ICDT ’97, (1997)

  2. Abiteboul, S., Quass, D., McHugh, J., Widom, J., Wiener, J.: The Lorel query language for semistructured data. Int. J. Digit. Lib. 1(1), 68–88 (April, 1997)

    Google Scholar 

  3. Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco, California 94104–3205, USA

  4. Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava, D., Wu, Y.: Structural Joins: A primitive for efficient XML query pattern matching. In: Proceedings of ICDE '02, (2002)

  5. Barg, M., Wong. R.K.: A fast and versatile path index for querying semi-structured data. In: Proceedings of DASFAA '03, (2003)

  6. Berglund, A., Baog, S., Chamberlin, D., et al.: XML Path Languages (XPath), ver 2.0. W3Cworking Draft, 2001. Tech. Report WD-xpath20-20011220, W3C, 2001, http://www.w3.org/TR/WD-xpath20-20011220

  7. Biron, P.V., Malhotra, A.: XML Schema Part 2: Datatypes W3C Recommendation 02, (May 2001). http://www.w3.org/TR/xmlschema-2/

  8. Buneman, P., Davidson, S.B., Hillebrand, G.G., Suciu, D.: A query language and optimization techniques for unstructured data. In: Proceedings of SIGMOD '96, (1996)

  9. Chamberlin, D., Robie, J., Florescu, D.: Quilt: An XML query language for heterogeneous data sources. In: Proceedings of Third International Workshop WebDB, (2000).

  10. Chamberlin, D., Clark, J., Florescu, D., et al.: XQuery 1.0: An XML Query Language. W3C working draft. Technical Report WD-xquery-20010607, World Wide Web Consortium, (2001)

  11. Chen, Q., Lim, A., Ong, K.W.: D(K)-Index: An adaptive structural summary for graph-structured data. In: Proceedings of SIGMOD '03, (2003)

  12. Cheng, J., Yu, G., Yu, J.X., Wang. G.: PathGuide: An efficient clustering based indexing method for XML path expressions. In: Proceedings of DASFAA ’03, (2003)

  13. Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V.J., Zaniolo, C.: Efficient structural joins on indexed XML documents. In: Proceedings of VLDB ’02, (2002)

  14. Christophides, V., Abiteboul, S., Cluet, S., Scholl, M.: From structured documents to novel query facilities. In: Proceedings of SIGMOD '94, (1994)

  15. Chung, C.-W., Min, J.-K., Shim, K.: APEX: An adaptive path index for XML data. In: Proceedings of SIGMOD '02, (2002)

  16. Cooper, B.F., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A fast index for semistructured data. In: Proceedings of VLDB '01, (2001)

  17. Deutsch, A., Fernandez, M., Florescu, D., Levy, A., Suciu, D.: XML-QL: A query language for XML. W3C Working Draft. Submission to the World Wide Web Consortium 19-August-1998. Available from http://www.w3.org/TR/NOTE-xml-ql., (1998)

  18. Deutsch, A., Fernandez, M., Suciu, D.: Storing semistructured data with STORED. In: Proceedings of SIGMOD '99, (1999)

  19. Fernándex, M., Florescu, D., Levy, A., Suciu, D.: Declarative specificaiton of web sites with strudel. VLDB J. 9(1), (2000). Electronic edition

  20. Flajolet, P., Sedgewick, R.: Digital search trees revisited. SIAM J. Comput 15(3), 748–767 (1986)

    Google Scholar 

  21. Florescu, D., Kossmann, D.: A performance evaluation of alternative mapping schemes for storing XML data In: a relational database. (1999). Survey report

  22. Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization In: semistructured databases. In: Proceedings of VLDB '97, (1997)

  23. Goldman, R., Widom, J.: Approximate dataguides. In: Proceedings of the Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats, (1999)

  24. Gonnet, G.: Efficient searching of text and pictures (extended abstract). Technical Report OED-88-02, University of Waterloo, (1988)

  25. Harding, P.J., Li, Q., Moon, B.: XISS/R: XML indexing and storage system using RDBMS. In: Proceedings of VLDB '03, (2003)

  26. He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: Proceedings of ICDE '04, (2004)

  27. Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, Massachusetts, (1979), Date

    MATH  Google Scholar 

  28. The Internet Movie Database Ltd. Internet movie database: http://www.imdb.com

  29. Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V., Nierman, A., Paparizos, S., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: TIMBER: a native XML database. VLDB J. 11 (2002)

  30. Jiang, H., Lu, H., Wang, W., Yu, J.X.: Path materialization revisited: An efficient storage model for XML data. In: Proceedings of ADC '02, (2002)

  31. Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: Indexing XML data for efficient structural join. In: Proceedings of ICDE '03, (2003)

  32. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proceedings of SIGMOD '02, (2002)

  33. Kaushik, R., Bohannon, P., Naughton, J.F., Shenoy, P.: Updates for structure indexes. In: Proceedings of VLDB '02, (2002)

  34. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths In: graph-structured data. In: Proceedings of ICDE '02, (2002)

  35. Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proceedings of VLDB '01, (2001)

  36. Lu, H., Wang, G., Yu, G., Bao, Y., Lv, J., Yu, Y.: XBase: Making your gigabyte disk files queriable. In: Proceedings of SIGMOD '02, (2002)

  37. McHugh, J., Abiteboul, S., Goldman, R., Quass, D., Widom, J.: Lore: a database management system for semistructured data. Sigmod Rec. 26(3), 54–66 (1997)

    Google Scholar 

  38. Mc Hugh, J., Widom, J., Abiteboul, S., Luo, Q., Rajamaran, A.: Indexing semistructured data. Technical report, Stanford University, Computer Science Department, Database Group, (1998)

  39. Mendelzon, A., Mihaila, G., Milo, T.: Querying the world wide web. In: Proceedings of the Fourth Conference on Parallel and Distributed Information Systems, (1996)

  40. Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of ICDT '99, (1999)

  41. Morrison, D.R.: PATRICIA-practical algorithm to retrieve information coded In: alphanumeric. Journal of the Association of Computing Machinary 15(4), 514–534 (1968)

  42. NASA XML Group. http://www.imdb.com

  43. Nestorov, S., Ullman, J., Wiener, J., Chawathe, S.: Representative objects: concise representation of semistructured, hierarchical data. In: Proceedings of ICDE '97, (1997)

  44. Open Directory Project. DMOZ open directory project. http://www.dmoz.org

  45. Paige, R., Tarjan, R.: Three partition refinement algorithms. SIAM J. Comput 16(6), 973–989 (1987)

    Google Scholar 

  46. Papakonstantinou, Y., Garcia-Molina, H., Widom, J.: Object exchange across heterogeneous information sources. In: Proceedings of ICDE '95, (1995)

  47. Prüfer, H.: Neuer beweis eines satzes über permutationen. Archiv für Mathematik und Physik 27, 142–144 (1918)

    MATH  Google Scholar 

  48. Rao, P., Moon, B.: PRIX: Indexing and querying XML using prüfer sequences. In: Proceedings of ICDE '04, (2004)

  49. Sacks-Davis, R., Arnold-Moore, T., Zobel, J.: Database systems for structured documents. In: International Symposium on Advanced Database Technologies and Their Integration (ADTI '94), Nara, Japan, October (1994)

  50. Schmidt, A., Kersten, M.L., Windhouwer, M., Waas, F.: Efficient relational storage and retrieval of XML documents. In: Proceedings of WebDB '00, (2000)

  51. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: Proceedings of VLDB '02, (2002)

  52. Software AG: Tamino XML database. http://www.softwareag.com/tamino/

  53. van Zwol, R., Apers, P., Wilschutz, A.: Implementing semi-structured data with MOA. In: Workshop on Query Processing for Semistructured data and Non-Standard Data Formats (in conjunction with ICDT), (1999)

  54. Wang, G., Liu, M.: Query processing and optimization for regular path expressions. In: Proceedings of CAiSE '03, (2003)

  55. Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: A DynamicInde x Method for Querying XML Data by Tree Structures. In: Proceedings of SIGMOD '03, (2003)

  56. Wang, G., Sun, B., Lv, J., Yu, G.: Rpe query processing and optimization techniques for xml databases. J. Comput Sci. Technol 19(2), 224–237 (March, 2004)

    Article  MATH  Google Scholar 

  57. XYZFind: XML database. http://www.xyzfind.com/

  58. Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: Xrel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Transaction Internet Technology 1(1), 110–141 (2001)

    Google Scholar 

  59. Zhou, A., Lu, H., Zheng, S., Liang, Y., Zhang, L., Ji, W., Tian, Z.: VXMLR: A visual XML-relational database system. In: Proceedings of VLDB '01, (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wong, KF., Yu, J.X. & Tang, N. Answering XML Queries Using Path-Based Indexes: A Survey. World Wide Web 9, 277–299 (2006). https://doi.org/10.1007/s11280-006-8557-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-006-8557-z

Keywords

Navigation