Abstract
In this paper we investigate the problem of efficiently evaluating XPath queries over large XML data stored in a distributed manner. We propose a MapReduce algorithm based on a query decomposition which computes all expected answers in one MapReduce step. The algorithm can be applied over large XML data which is given either as a single distributed document or as a collection of small XML documents.
This research was supported by the project “Handling Uncertainty in Data Intensive Applications”, co-financed by the European Union (European Social Fund - ESF) and Greek national funds, through the Operational Program ”Education and Lifelong Learning”, under the research funding program THALES.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
XMark: An XML Benchmark Project, http://www.xml-benchmark.org
Choi, H., Lee, K.-H., Kim, S.-H., Lee, Y.-J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: CIKM, pp. 2737–2739 (2012)
Cong, G., Fan, W., Kementsietsidis, A., Li, J., Liu, X.: Partial evaluation for distributed XPath query processing and beyond. ACM Trans. Database Syst. 37(4), 32 (2012)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Fegaras, L., Li, C., Gupta, U., Philip, J.: XML query optimization in Map-Reduce. In: WebDB (2011)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River (2008)
Suciu, D.: Distributed query evaluation on semistructured data. ACM Transactions on Database Systems 27, 2002 (1997)
Tatarinov, I., Viglas, S., Beyer, K.S., Shanmugasundaram, J., Shekita, E.J., Zhang, C.: Storing and querying ordered XML using a relational database system. In: SIGMOD Conference, pp. 204–215 (2002)
Zinn, D., Khler, S., Bowers, S., Ludscher, B.: Parallelizing XML processing pipelines via MapReduce. Technical report (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Damigos, M., Gergatsoulis, M., Plitsos, S. (2014). Distributed Processing of XPath Queries Using MapReduce. In: Catania, B., et al. New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 241. Springer, Cham. https://doi.org/10.1007/978-3-319-01863-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-01863-8_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01862-1
Online ISBN: 978-3-319-01863-8
eBook Packages: EngineeringEngineering (R0)