Skip to main content
Log in

Automatic parallelization of XQuery programs on multi-core systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The popularity of multi-core systems makes software parallelization become an important way to improve performance. As a mainstream XML query language, XQuery is the core of XML processing. It is critical to take full advantage of multi-core computing to improve XML processing performance through parallelization of XQuery. However, usually it is difficult to parallelize XQuery programs because of the nested style of XQuery expressions. Moreover, implicit parallelism is necessary to simplify the development of parallel XML application. In this paper, we propose an automatic parallelization approach, which can automatically select proper types of parallelism for a specific XQuery query. Specifically, we propose a functional intermediate language called pFXQL (parallel Functional XML Query Language) to describe parallel query plans. pFXQL has parallel semantics and is well complied with XQuery. We propose a cost model to effectively support plan generation and the selection of a preferred plan. The model estimates both computational cost and parallel cost. We implement our approach in XQuery engine and conduct experiments on various multi-core systems. Experimental results verify the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Alexandrov A, Ewen S, Heimel M, Hueske F, Kao O, Markl V, Nijkamp E, Warneke D (2011) MapReduce and PACT-comparing data parallel programming models. In: Proceedings of the 14th conference on database systems for business, technology, and web (BTW), pp 25–44

  2. Bamford R, Borkar V, Brantner M, Fischer PM, Florescu D, Graf D, Kossmann D, Kraska T, Muresan D, Nasoi S (2009) XQuery reloaded. Proc VLDB Endow 2(2):1342–1353

    Article  Google Scholar 

  3. Berglund A, Boag S, Chamberlin D, Fernandez MF, Kay M, Robie J, Siméon J (2015) XML Path Language (XPath) 2.0 (Second Edition). http://www.w3.org/TR/xpath20/

  4. Berglund A, Fernández M, Malhotra A, Marsh J, Nagy M, Walsh N (2015) XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition). http://www.w3.org/TR/xpath-datamodel/

  5. Bidoit N, Colazzo D, Sartiani C, Solimando A, Ulliana F (2015) Andromeda: a system for processing queries and updates on big XML documents. In: Morzy T, Valduriez P, Bellatreche L (eds) New trends in databases and information systems: ADBIS 2015 short papers and workshops, BigDap, DCSA, GID, MEBIS, OAIS, SW4CH, WISARD. Springer International Publishing, Cham, pp 218–228. doi:10.1007/978-3-319-23201-0_24

  6. Boag S, Chamberlin D, Fernández MF, Florescu D, Robie J, Siméon J, Stefanescu M (2015) XQuery 1.0: An XML Query Language (Second Edition). http://www.w3.org/TR/xquery/

  7. Bordawekar R, Lim L, Kementsietsidis A, Kok BWL (2010) Statistics-based parallelization of XPath queries in shared memory systems. In: Proceedings of the 13th International Conference on Extending Database Technology (EDBT2010), pp 159–170

  8. Camachorodríguez J, Colazzo D, Manolescu I (2014) PAXQuery: a massively parallel XQuery processor. In: Proceedings of workshop on data analytics in the cloud (Danac’14), pp 1–4. doi:10.1145/2627770.2627772

  9. Carman EP Jr, Vinayak TW, Borkar R, Michael J, Carey, Tsotras VJ (2015) Apache VXQuery: a scalable XQuery implementation. arXiv:1504.00331 (CoRR)

  10. Chamberlin D, Fankhauser P, Florescu D, Marchiori M, Robie J (2007) XML query use cases. http://www.w3.org/TR/xquery-use-cases/

  11. Chen R, Liao H, Chen W (2011) XML parsing schema based on parallel sub-tree construction. Comput Sci 38(3):191–194

    Google Scholar 

  12. Damigos M, Gergatsoulis M, Plitsos S (2014) Distributed processing of XPath queries using mapReduce. In: Catania B, Cerquitelli T, Chiusano S et al. (eds) New trends in databases and information systems: 17th east european conference on advances in databases and information systems. Springer International Publishing, Cham, pp 69–77. doi:10.1007/978-3-319-01863-8_8

  13. Dongarra J (2007) The promise and perils of the coming multicore revolution and its impact. CTWatch Q 3(1):1–33

    MathSciNet  Google Scholar 

  14. Draper D, Fankhauser P, Fernández MF, Malhotra A, Rose K, Rys M, Siméon J, Wadler P (2015) XQuery 1.0 and XPath 2.0 Formal Semantics (Second Edition). http://www.w3.org/TR/xquery-semantics/

  15. Fegaras L, Dash R, Wang YH (2006) A fully pipelined XQuery processor. In: XQuery Implementation, Experience and Perspectives (XIME-P’06) Workshop

  16. Fernández M, Jim T, Morton K, Onose N, Simeon J (2007) DXQ: a distributed XQuery scripting language. In: Proceedings of the 4th international workshop on XQuery implementation, experience and perspectives, pp 1–6

  17. Fluet M, Rainey M, Reppy J, Shaw A (2008) Implicitly-threaded parallelism in Manticore. ACM Sigplan Not 9:119–130

    Article  MATH  Google Scholar 

  18. Frasincar F, Houben G-J, Pau C (2002) XAL: an algebra for XML query optimization. Aust Comput Sci Commun 24 (2):49–56. doi:10.1145/563932.563912

  19. Hammond K (1994) Parallel functional programming: an introduction. In: International Symposium on Parallel Symbolic Computation, Citeseer

  20. Herlihy M, Shavit N (2008) The art of multiprocessor programming. Morgan Kaufmann Publisher, Burlington

    Google Scholar 

  21. Hidaka S, Kato H, Yoshikawa M (2007) A relative cost model for XQuery. In: Proceedings of the 2007 ACM symposium on Applied computing, pp 1332–1333

  22. Jones SP, Leshchinskiy R, Keller G, Chakravarty MMT (2008) Harnessing the multicores: nested data parallelism in Haskell. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2008) Dagstuhl, Citeseer, Germany

  23. Kahn G (1987) Natural semantics. Springer, Heidelberg

    Book  MATH  Google Scholar 

  24. Kepser S (2004) A simple proof for the Turing-completeness of XSLT and XQuery. In: Extreme Markup Languages

  25. Khatchadourian S, Consens MP, Siméon J (2011) Having a ChuQL at XML on the Cloud. In: Mendelzon Int’l A. Workshop

  26. Kling P, Ozsu MT, Daudjee K (2010) Distributed XML query processing: Fragmentation, localization and pruning. University of Waterloo, Technical Report CS-2010-02

  27. Li X (2006) Efficient and parallel evaluation of XQuery. The Ohio State University

  28. Liao H (2013) Principles and implementation techniques of XQuery. Science Press, Beijing

    Google Scholar 

  29. Liao H, Shan W, Gao H (2013) Automatic parallelization of XQuery programs. J Softw 8(4):842–851

    Article  Google Scholar 

  30. Liao H, Tang L (2009) A decorrelation method based on XQA Query algebra. J Beijing Univ Technol 35(8):1108–1114

  31. Loogen R, Ortega-Mallén Y, Peña-Marí R (2005) Parallel functional programming in Eden. J Funct Program 15(03):431–475

    Article  MATH  Google Scholar 

  32. Lu W, Gannon D (2007) Parallel XML processing by work stealing. In: Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches, pp 31–38

  33. Machdi I, Amagasa T, Kitagawa H (2010) Parallel holistic twig joins on a multi-core system. Int J Web Inform Syst 6(2):149–177

    Article  Google Scholar 

  34. Miao H, Nie T, Yue D, Zhang T, Liu J (2012) Algebra for parallel XQuery processing. In: Bao Z, Gao Y, Gu Y et al. (eds) Web-Age information management: WAIM 2012 international workshops: GDMM, IWSN, MDSP, USDM, and XMLDM. Springer, Berlin, Heidelberg, pp 1–10. doi:10.1007/978-3-642-33050-6_1

  35. Quan Y, Liao H (2014) A Task Scheduling Algorithm for Automated Parallel Processing of XQuery. In: the Sixth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP2014) 2014. pp 250–254

  36. Re C, Brinkley J, Hinshaw K, Suciu D (2004) Distributed xquery. In: Workshop on Information Integration on the Web, Citeseer, pp 116–121

  37. Re C, Siméon J, Fernandez M (2006) A complete and efficient algebraic compiler for XQuery. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), pp 14–25

  38. Schmidt A, Waas F, Kersten M, Carey MJ, Manolescu I, Busse R (2002) XMark: a benchmark for XML data management. In: Proceedings of the 28th international conference on Very Large Data Bases, 2002. VLDB Endowment, pp 974–985

  39. Seshadri P, Pirahesh H, Leung TYC (1996) Complex query decorrelation. In: Proceedings of the Twelfth International Conference on Data Engineering, pp 450–458

  40. Shnaiderman L, Shmueli O (2015) Multi-Core Processing of XML Twig Patterns. IEEE Trans Knowl Data Eng 27(4):1057–1070

    Article  Google Scholar 

  41. Sivaramakrishnan KC, Ziarek L, Prasad R, Jagannathan S (2010) Lightweight asynchrony using parasitic threads. In: Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming (DAMP2010), pp 63–72

  42. Spoonhower D, Blelloch GE, Harper R (2007) A Semantic Framework for Scheduling Parallel Programs. Computer Science Department 886

  43. Sutter H (2005) The free lunch is over: a fundamental turn toward concurrency in software. http://www.gotw.ca/publications/concurrency-ddj.htm

  44. Wilkinson B, Allen M (2005) Parallel programming: techniques and applications using networked workstations and parallel computers, 2nd edn. Prentice hall, New Jersey

    Google Scholar 

  45. Yui M, Miyazaki J, Uemura S, Kato H (2008) XBird/D: distributed and parallel XQuery processing using remote proxy. In: Proceedings of the 2008 ACM symposium on Applied computing, pp 1003–1007

  46. Zhang X, Liao H (2010) A framework for XQuery system with XML algebra and tree pattern query. J Front Comput Sci Technol 4(11):996–1004

    MathSciNet  Google Scholar 

  47. Zhang Y, Tang N, Boncz P (2009) Efficient distribution of full-fledged XQuery. In: IEEE 25th International Conference on Data Engineering (ICDE’09) IEEE, pp 565–576

Download references

Acknowledgments

This research was supported by the National Natural Science Foundation of China (No. 41201462), the Natural Science Foundation of Fujian Province of China (No. 2014J01245) and the Science Foundation of Jimei University (No. ZQ2014003). Many thanks are due to Dr. Huang Bin for reading this paper and offering helpful advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rongxin Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R., Liao, H., Wang, Z. et al. Automatic parallelization of XQuery programs on multi-core systems. J Supercomput 72, 1517–1548 (2016). https://doi.org/10.1007/s11227-016-1679-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1679-x

Keywords

Navigation