A Resource Efficient Hybrid Data Structure for Twig Queries

Wilson, John; Gourlay, Richard; Japp, Robert; Neumüller, Mathias

doi:10.1007/11841920_6

John Wilson²¹,
Richard Gourlay²¹,
Robert Japp²¹ &
…
Mathias Neumüller²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4156))

Included in the following conference series:

International XML Database Symposium

297 Accesses
3 Citations

Abstract

Designing data structures for use in mobile devices requires attention on optimising data volumes with associated benefits for data transmission, storage space and battery use. For semistructured data, tree summarisation techniques can be used to reduce the volume of structured elements while dictionary compression can efficiently deal with value-based predicates. This paper introduces an integration of the two approaches using numbering schemes to connect the separate elements, the key strength of this hybrid technique is that both structural and value predicates can be resolved in one graph, while further allowing for compression of the resulting data structure. Performance measures that show advantages of using this hybrid structure are presented, together with an analysis of query resolution using a number of different index granularities. As the current trend is towards the requirement for working with larger semi-structured data sets this work allows for the utilisation of these data sets whilst reducing both the bandwidth and storage space necessary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amato, G., Debole, F., Zezula, P., Rabitti, F.: YAPI: Yet another path index for XML searching. In: Proc of ECDL, pp. 176–187 (2003)
Google Scholar
Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: Xquec: Pushing queries to compressed XML data. In: VLDB, pp. 1065–1068 (2003)
Google Scholar
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: Optimal XML pattern matching. In: Proc. of SIGMOD, pp. 310–321 (2002)
Google Scholar
Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: Proc. of VLDB, pp. 141–152 (2003)
Google Scholar
Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.D.: Vectorizing and querying large XML repositories. In: ICDE 2005: Proceedings of the 21st International Conference on Data Engineering, pp. 261–272. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
Buneman, P., Davidson, S.B., Fernandez, M.F., Suciu, D.: Adding structure to unstructured data. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 336–350. Springer, Heidelberg (1996)
Google Scholar
Chen, Q., Lim, A., Ong, K.W.: D(k)-index: an adaptive structural summary for graph-structured data. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 134–144. ACM Press, New York (2003)
Chapter Google Scholar
Cheney, J.: Compressing XML with multiplexed hierarchical ppm models. In: Data Compression Conference, p. 163 (2001)
Google Scholar
Cheney, J.: Tradeoffs in XML database compression. In: DCC, pp. 392–401 (2006)
Google Scholar
Cheng, J., Ng, W.: Xqzip: Querying compressed XML using structural indexing. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 219–236. Springer, Heidelberg (2004)
Chapter Google Scholar
Chien, S.Y., Vagena, Z., et al.: Efficient structural joins on indexed XML documents. In: Proc. of VLDB, pp. 263–274 (2002)
Google Scholar
Choi, B., Buneman, P.: XML vectorization: A column-based XML storage model. Technical Report MS-CIS-03-17, University of Pennsylvania (2003)
Google Scholar
Dietz, P.F.: Maintaining order in a linked list. In: Proc. of STOCS, pp. 122–127 (1982)
Google Scholar
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and searching XML data via two zips. In: Proc. World Wide Web Conference 2006 (WWW 2006) (2006)
Google Scholar
Goldman, R., Widom, J.: DataGuides: Enabling query formulation and optimization in semistructured databases. In: Proc. of VLDB, pp. 436–445 (1997)
Google Scholar
Grust, T.: Accelerating XPath location steps. In: Proc. of SIGMOD, pp. 109–120 (2002)
Google Scholar
Halverson, A., Burger, J., et al.: Mixed mode XML query processing. In: Proc of VLDB, pp. 225–236 (2003)
Google Scholar
Kaushik, R., Bohannon, P., et al.: Covering indexes for branching path queries. In: Proc. of SIGMOD, pp. 133–144 (2002)
Google Scholar
Kaushik, R., Krishnamurthy, R., et al.: On the integration of structure indexes and inverted lists. In: Proc. of SIGMOD, pp. 779–790 (2004)
Google Scholar
Kaushik, R., Shenoy, P., et al.: Exploiting local similarity for indexing paths in graph-structured data. In: Proc. of ICDE, pp. 129–140 (2002)
Google Scholar
Kha, D.D., Yoshikawa, M., Uemura, S.: A structural numbering scheme for XML data. In: Proc. of EDBT, pp. 279–289 (2001)
Google Scholar
Lee, J., Lee, Y., Kang, S., Jin, H., Lee, S., Kim, B., Song, J.: Bmq-index: Shared and incremental processing of border monitoring queries over data streams. In: The 7th International Conference on Mobile Data Management (MDM 2006) (2006)
Google Scholar
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB, pp. 361–370 (2001)
Google Scholar
Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data. In: Proc. of SIGMOD, pp. 153–164 (2000)
Google Scholar
McHugh, J., Widom, J.: Query optimization for XML. In: Proc. of VLDB, pp. 315–326 (1999)
Google Scholar
Min, J.-K., Park, M.-J., Chung, C.-W.: Xpress: A queriable compression for xml data. In: SIGMOD Conference, pp. 122–133 (2003)
Google Scholar
Naughton, J.F., DeWitt, D.J., et al.: The Niagara internet query system. IEEE Data Eng. Bull. 24(2), 27–33 (2001)
Google Scholar
Neumüller, M., Wilson, J.N.: Improving XML processing using adapted data structures. In: Proc. of WEBDB, pp. 63–77 (2002)
Google Scholar
Schmidt, A.R., Waas, F., et al.: XMark: A benchmark for XML data management. In: Proc. of VLDB, pp. 27–32 (2002)
Google Scholar
Tolani, P.M., Haritsa, J.R.: Xgrind: A query-friendly XML compressor. In: ICDE, pp. 225–234 (2002)
Google Scholar
Wang, W., Jiang, H., Wang, H., Lin, X., Lu, H., Li, J.: Efficient processing of XML path queries using the disk-based F&B Index. In: Proc. of VLDB, pp. 145–156 (2005)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD 2004: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pp. 335–346. ACM Press, New York (2004)
Chapter Google Scholar
Zezula, P., Amato, G., et al.: Tree Signatures for XML Querying and Navigation. In: Bellahsène, Z., Chaudhri, A.B., Rahm, E., Rys, M., Unland, R. (eds.) XSym 2003. LNCS, vol. 2824, pp. 149–163. Springer, Heidelberg (2003)
Chapter Google Scholar
Zhang, C., Naughton, J.F., et al.: On supporting containment queries in relational database management systems. In: Proc. of SIGMOD, pp. 425–436 (2001)
Google Scholar
Zhang, N., Ozsu, M.T., Ilyas, I.F., Aboulnaga, A., Cheriton, D.R.: Fix: Feature-based indexing technique for XML documents. Technical Report CS-2006-07, School of Computer Science, University of Waterloo (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, University of Strathclyde, Glasgow, UK
John Wilson, Richard Gourlay, Robert Japp & Mathias Neumüller

Authors

John Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Richard Gourlay
View author publications
You can also search for this author in PubMed Google Scholar
Robert Japp
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Neumüller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AT&T Labs–Research, P.O. Box, 07932, Florham Park, NJ, USA
Sihem Amer-Yahia
LIRMM - UMR 5506 CNRS / Université Montpellier 2, 161 Rue Ada, P.O. Box, F-34392, Montpellier Cedex 5,
Zohra Bellahsène
Department of Computer Science, ETH Zurich, Switzerland
Ela Hunt
Universität Duisburg-Essen,
Rainer Unland
Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilson, J., Gourlay, R., Japp, R., Neumüller, M. (2006). A Resource Efficient Hybrid Data Structure for Twig Queries. In: Amer-Yahia, S., Bellahsène, Z., Hunt, E., Unland, R., Yu, J.X. (eds) Database and XML Technologies. XSym 2006. Lecture Notes in Computer Science, vol 4156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11841920_6

Download citation

DOI: https://doi.org/10.1007/11841920_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38877-7
Online ISBN: 978-3-540-38879-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics