ABSTRACT
XML is the de facto standard for data representation and exchange over the Web. Given the diversity of the information available in XML, it is very useful to annotate XML data with a wide variety of meta-data, such as quality and sensitivity. When querying such XML data, say using XPath, it is important to efficiently identify the data that meet specified constraints on the meta-data. For example, different users may be satisfied with different levels of quality guarantees, or may only have access to different parts of the XML data based on specified security policies. In this paper, we address the problem of efficiently identifying the XML elements along a location step in an XPath query, that satisfy meta-data range constraints, when the meta-data levels are specifically drawn from an ordered domain (e.g., accuracy in [0,1], recency using timestamps, multi-level security, etc.). More specifically, we develop a family of index structures, which we refer to as meta-data indexes, to address this problem. A meta-data index is easily instantiated using a multi-dimensional index structure, such as an R-tree, incorporating novel query and update algorithms. We show that the full meta-data index (FMI), based on associating each XML element with its meta-data level, has a very high update cost for modifying an element's meta-data level. We resolve this problem by designing the inheritance meta-data index (IMI), in which (i) actual meta-data levels are associated only with elements for which this value is explicitly specified, and (ii) inherited meta-data levels and inheritance source nodes are associated with non-leaf nodes of the index structure. We design efficient query (for all XPath axes) and update (of meta-data levels) algorithms for the IMI, and experimentally demonstrate the superiority of the IMI over the FMI using benchmark data sets.
- S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient XML query pattern matching. In Proc. of ICDE, 2002. Google ScholarDigital Library
- A. Berglund, S. Boag, D. Chamberlin, M. F. Fernandez, M. Kay, J. Robie, and J. Simeon. XML path language (XPath) 2.0. W3C Working Draft. Available from http://www.w3.org/TR/xpath20/.Google Scholar
- E. Bertino, S. Castano, and E. Ferrari. Securing XML documents with Author-X. IEEE Internet Computing, 5(3):21--31, 2001. Google ScholarDigital Library
- D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In Proc. of VLDB, 2004. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. Tan. On propagation and deletion of annotations through views. In Proc. of PODS, 2002. Google ScholarDigital Library
- S. Chawathe, S. Abiteboul, and J. Widom. Representing and querying changes in semistructured data. In Proc. of ICDE, 1998. Google ScholarDigital Library
- S. Cho, S. Amer-Yahia, L.V.S. Lakshmanan, and D. Srivastava. Optimizing the secure evaluation of twig queries. In Proc. of VLDB, 2002. Google ScholarDigital Library
- B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, 2001. Google ScholarDigital Library
- E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P. Samarati. Design and implementation of an access control processor for XML documents. Computer Networks, 33(1--6):59--75, 2000. Also in WWW9. Google ScholarDigital Library
- T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. Wiley Publishers, 2003. Google ScholarDigital Library
- L. Delcambre, D. Maier, S. Bowers, M. Weaver, L. Deng, P. Gorman, J. Ash, M. Lavelle, and J. Lyman. Bundles in captivity: An application of superimposed information. In Proc. of ICDE, 2001. Google ScholarDigital Library
- V. Gaede and O. Gunther. Multidimensional access methods. ACM Computing Surveys, 30(2), 1998. Google ScholarDigital Library
- T. Grust. Accelerating XPath location steps. In Proc. of SIGMOD, 2002. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proc. of SIGMOD, 1984. Google ScholarDigital Library
- H. V. Jagadish, S. Al-Khalifa, A. Chapman, L. V. S. Lakshmanan, A. Nierman, S. Paparizos, J. M. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, and C. Yu. TIMBER: A native XML database. The VLDB Journal, 11(4):274--291, 2002. Google ScholarDigital Library
- S. Jajodia and R. Sandhu. Toward a multilevel secure relational data model. In PODS, 1991. Google ScholarDigital Library
- Q. Li and B. Moon. Indexing and querying XML data for regular path expressions. In Proc. of VLDB, 2001. Google ScholarDigital Library
- G. Mihaila, L. Raschid, and M.-E. Vidal. Querying "quality of data" metadata. In Proc. of IEEE META-DATA Conference, 1999.Google Scholar
- S. Murthy, D. Maier, and L. Delcambre. Querying bi-level information. In Proc. of WebDB, 2004. Google ScholarDigital Library
- K.V. Ravikanth, D. Agrawal, A. El-Abbadi, A.K. Singh, and T. Smith. Indexing hierarchical data. Technical Report, UCSB, CS-Tr-9514, 1995.Google Scholar
- H. Schoning. Tamino - A DBMS designed for XML. In Proc. ICDE Conf., pp. 149--154, 2001. Google ScholarDigital Library
- H. Wang, S. Park, W. Fan and P. S. Yu. ViST: A dynamic index method for querying XML data by tree structures. In Proc. of VLDB, 2003. Google ScholarDigital Library
- T. Yu, D. Srivastava, L.V.S. Lakshmanan, and H.V. Jagadish. A compressed accessibility map for XML. ACM TODS, 29(2):363--402, 2004. Google ScholarDigital Library
- J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In Proc. of CIDR, 2005.Google Scholar
Index Terms
- Meta-data indexing for XPath location steps
Recommendations
Optimized XPath evaluation for schema-compressed XML data
ADC '12: Proceedings of the Twenty-Third Australasian Database Conference - Volume 124XML has become the de facto standard for data exchange in enterprise information systems. But whenever XML data is stored or processed, e. g. in form of a DOM tree representation, the XML markup causes a huge blow-up of the memory consumption compared ...
Indexing XML documents for XPath query processing in external memory
Special issue: ER 2003Existing encoding schemes and index structures proposed for XML query processing primarily target the containment relationship, specifically the parent-child and ancestor-descendant relationship. The presence of preceding-sibling and following-sibling ...
Schema-conscious XML indexing
User queries on extensible markup language (XML) documents are typically expressed as regular path expressions. A variety of indexing techniques for efficiently retrieving the results to such queries have been proposed in the recent literature. While ...
Comments