Abstract
The performance of XPath query is the key factor to the capacity of XML processing. It is an important way to improve the performance of XPath by making full use of multi-threaded computing resources for parallel processing. However, in the process of XPath parallelization, load imbalance and thread inefficiency often lead to the decline of parallel performance. In this paper, we propose a cost optimization-based parallel XPath query method named coPXQ. This method improves the parallel processing effect of navigational XPath query through a series of optimization measures. The main measures include as follows: first, by optimizing the storage of XML node relation index, both storage and access efficiency of the index are improved. Secondly, load balancing is realized by a new cost estimation method according to the number of XML node relations to optimize parallel relation index creation and parallel primitive execution. Thirdly, the strategy of determining the number of worker threads based on parallel effectiveness estimation is utilized to ensure the effective use of threads in query. Compared with the existing typical methods, the experimental results show that our method can obtain better parallel performance.
Similar content being viewed by others
References
Buneman P (1997) Semistructured data. In: Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems. ACM, pp 117–121
Robie J, Dyck M, Spiegel J (2017) XML path language (XPath). https://www.w3.org/TR/xpath/
Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal XML pattern matching. In: the 2002 ACM SIGMOD International Conference on Management of Data, Wisconsin, USA, 2002. ACM, pp 310–321
Cate BT, Marx M (2007) Navigational XPath: calculus and algebra. ACM SIGMOD Rec 36(2):19–26
Grün C, Worteler L, Kircher L, Shadura R (2018) BaseX: the XML framework https://basex.org/
Meier W (2019) EXist-db Project https://github.com/exist-db/exist
Franc X (2019) Qizxopen http://www.axyana.com/qizxopen
Shah B, Rao P, Moon B, Rajagopalan M (2009) A data parallel algorithm for XML DOM parsing. In: Database and XML technologies, pp 75–90
Pan Y, Lu W, Zhang Y, Chili K (2007) A static load-balancing scheme for parallel XML parsing on multicore CPUs. In: Seventh IEEE international symposium on cluster computing and the grid (CCGRID 2007). IEEE, pp 351–362
Machdi I, Amagasa T, Kitagawa H (2010) Parallel holistic twig joins on a multi-core system. Int J Web Inf Syst 6(2):149–177
Bordawekar R, Lim L, Shmueli O (2009) Parallelization of XPath queries using multi-core processors. In: International Conference on Extending Database Technology: Advances in Database Technology (EDBT2009), pp 180–191
Chen R, Liao H, Wang Z (2013) Parallel XPath evaluation based on node relation matrix. J Comput Inf Syst 9(19):7583–7592
Shnaiderman L, Shmueli O (2015) Multi-core processing of XML twig patterns. iEEE Trans Knowl Data Eng 27(4):1057–1070
Chen R, Liao H, Wang Z, Su H (2016) Automatic parallelization of XQuery programs on multi-core systems. J Supercomput 72(4):1517–1548
Miao H, Nie T, Yue D, Zhang T, Liu J (2012) Algebra for parallel XQuery processing. Web Age Inf Manag 2012:1–10
Kim SH, Lee KH, Lee YJ (2016) Multi-query processing of XML data streams on multicore. J Supercomput 73(6):1–30
Jiang L, Zhao Z (2017) Grammar-aware parallelization for scalable XPath querying. In: the 22nd ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP ’17),2017. ACM, pp 371–383
Karsin B, Casanova H, Lim L (2017) Low-latency XPath query evaluation on multi-core processors. In: Hawaii International Conference on System Sciences, 2017, pp 6222–6231
Chen R, Wang Z, Hong Y (2021) Hong Y (2021) Pipelined XPath query based on cost optimization. Sci Program 19:1–16
Huang X, Si X, Yuan X, Wang C (2014) A dynamic load-balancing scheme for XPath queries parallelization in shared memory multi-core systems. J Comput 9:6
Moussalli R, Halstead R, Salloum M, Najjar WA, Tsotras VJ (2011) Efficient XML path filtering using GPUs. In: International workshop on accelerating data management systems using modern processor and storage architectures (ADMS 2011), Seattle, WA, USA
Kim S, Lee Y, Lee JJ (2015) Matrix-based XML stream processing using a GPU. In: IEEE international congress on big data
Sampson J, Gonzalez R (2006) Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers. In: The 39th annual IEEE/ACM international symposium on microarchitecture, Orlando, USA, 2006. pp 235–246
Willebeek-Lemair MH, Reeves AP (1993) Strategies for dynamic load balancing on highly parallel computers. IEEE Trans Parallel Distrib Syst 4(9):979–993
Weissman JB (2002) Predicting the cost and benefit of adapting data parallel applications in clusters. J Parallel Distrib Comput 62(8):1248–1271
Zuo W, Chen Y, He F, Chen K (2011) Load balancing parallelizing XML query processing based on shared cache chip multi-processor (CMP). Sci Res Essays 6(18):3914–3926
Subramaniam S, Haw SC, Soon LK (2021) Improved centralized XML query processing using distributed query workload. IEEE Access 9:29127–29142
Zhang C, Naughton J, DeWitt D, Luo Q, Lohman G (2001) On supporting containment queries in relational database management systems. In: ACM SIGMOD record, 2001, vol 2. ACM, pp 425–436
Sestakova E, Janousek J (2018) Automata approach to XML data indexing. Information 9(1):12
Widemann BT, Lepper M (2019) Simple and effective relation-based approaches to XPath and XSLT type checking. Technical Report, Bad Honnef (2015)
Bordawekar R, Lim L, Kementsietsidis A (2010) Statistics-based parallelization of XPath queries in shared memory. In: The 13th International Conference on Extending Database Technology (EDBT), 2010. ACM
Sato S, Hao W, Matsuzaki K (2018) Parallelization of XPath queries using modern XQuery processors. In: New Trends in Databases and Information Systems. ADBIS 2018
Hartmann S, Ma H, Schewe KD (2007) Cost-based vertical fragmentation for XML. In: al. KCCe (ed) APWeb/WAIM 2007. Springer, Berlin, Heidelberg, pp 12–24
Georgiadis H, Charalambides M, Vassalos V (2010) Efficient physical operators for cost-based XPath execution. In: Paper presented at the EDBT 2010
Hidaka S, Kato H, Yoshikawa M (2007) A relative cost model for XQuery. In: Proceedings of the 2007 ACM symposium on Applied computing, 2007. ACM, pp 1332–1333
Herlihy M, Shavit N (2008) The art of multiprocessor programming. Morgan Kaufmann, New York
University of Pennsylvania Treebank Project (2002) http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/reebank/treebank_e.xml
Schmidt A, Waas F, Kersten M, Carey MJ, Manolescu I, Busse R (2002) XMark: a benchmark for XML data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, 2002. VLDB Endowment, pp 974–985
Wilkinson B, Allen M (2005) Parallel programming: techniques and applications using networked workstations and parallel computers. 2nd edn, Pearson Education
Linford JC, Hermanns M-A, Geimer M, Boehme D, Wolf F (2008) Detecting load imbalance in massively parallel applications. Technical Report FZJ-JSC-IB-2008–09. Forschungszentrum Julich
Robie J, Dyck M, Spiegel J (2017) XQuery 3.1: an XML query language. https://www.w3.org/TR/xquery
Acknowledgements
This research was supported by the Natural Science Foundation of Fujian Province of China (2018J01538, 2020J01697), the Science Foundation of Jimei University (ZQ2014003), and Open Fund of Digital Fujian Big Data Modeling and Intelligent Computing Institute.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, R., Wang, Z., Su, H. et al. Parallel XPath query based on cost optimization. J Supercomput 78, 5420–5449 (2022). https://doi.org/10.1007/s11227-021-04074-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04074-y