Multi-query processing of XML data streams on multicore

Kim, Soo-Hyung; Lee, Kyong-Ha; Lee, Yoon-Joon

doi:10.1007/s11227-016-1919-0

Multi-query processing of XML data streams on multicore

Published: 12 November 2016

Volume 73, pages 2339–2368, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

321 Accesses
3 Citations
Explore all metrics

Abstract

The multicore architecture has been the norm for all computing systems in recent years as it provides the CPU-level support of parallelism. However, existing algorithms for processing XML streams do not fully take advantage of the facility since they have not been devised to run in parallel. In this article, we propose several methods to parallelize the finite state automata (FSA)-based XML stream processing technique efficiently. We transform a large collection of XPath expressions into multiple FSA-based query indexes and then process XML streams in parallel by virtue of the index-level parallelism. Each core works only with its own query index so that no synchronization issue occurs while filtering XML streams with multiple path patterns given by users. We also present an in-memory MapReduce model that enables to process a large collection of twig pattern joins over XML streams simultaneously. Twig pattern joins in our approach are performed by multiple H/W threads in a shared and balanced way. Extensive experiments show that our algorithm outperforms conventional algorithms with an 8-core CPU by up to ten times for processing 10 million XPath expressions over XML streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelization of Permuting XML Compressors

A Research Survey on Large XML Data: Streaming, Selectivity Estimation and Parallelism

Automatic parallelization of XQuery programs on multi-core systems

Article 07 March 2016

References

Altınel M, Franklin MJ (2000) Efficient filtering of xml documents for selective dissemination of information. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt
Barton C, Charles P, Goyal D, Raghavachari M, Fontoura M, Josifovski V (2003) Streaming XPath processing with forward and backward axes. In: Proceedings of the 19th International Conference on Data Engineering, 2003, IEEE, pp 455–466
Bordawekar R, Lim L, Kementsietsidis A, Kok BWL (2010) Statistics-based parallelization of xpath queries in shared memory systems. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 159–170
Bordawekar R, Lim L, Shmueli O (2009) Parallelization of xpath queries using multi-core processors: challenges and experiences. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, ACM, pp 180–191
Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (1998) Extensible markup language (XML). World Wide Web Consortium Recommendation REC-xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210
Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal xml pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, ACM, pp 310–321
Robie J, Chamberlin D, Dyck M, Snelson J (2014) XML path language (XPath) 3.0. World Wide Web Consortium Recommendation. https://www.w3.org/TR/xpath-30/
Chen S, Li HG, Tatemura J, Hsiung WP, Agrawal D, Candan KS (2006) Twig 2 stack: bottom-up processing of generalized-tree-pattern queries over xml documents. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB Endowment, pp 283–294
Chen S, Li HG, Tatemura J, Hsiung WP, Agrawal D, Candan KS (2008) Scalable filtering of multiple generalized-tree-pattern queries over XML streams. IEEE Trans Knowl Data Eng 20(12):1627–1640
Article Google Scholar
Chen T, Lu J, Ling TW (2005) On boosting holism in xml twig pattern matching using structural indexing techniques. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM, pp 455–466
Chen Y, Davidson SB, Zheng Y (2006) An efficient XPath query processor for XML streams. In: Proceedings of the 22nd International Conference on Data Engineering. IEEE, p 79
Choi H, Lee KH, Lee YJ (2014) Parallel labeling of massive xml data with mapreduce. J Supercomput 67(2):408–437
Article Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Diao Y, Altinel M, Franklin MJ, Zhang H, Fischer P (2003) Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans Database Syst (TODS) 28(4):467–516
Article Google Scholar
Feng J, Liu L, Li G, Li J, Sun Y (2010) An efficient parallel pathstack algorithm for processing XML twig queries on multi-core systems. In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications, Springer, pp 277–291
Fischer P (2013) XQuery: a lightweight, full-featured XQuery engine. http://mxquery.org/
Gou G, Chirkova R (2007) Efficient algorithms for evaluating xpath over streams. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, ACM, pp 269–280
Green TJ, Gupta A, Miklau G, Onizuka M, Suciu D (2004) Processing xml streams with deterministic automata and stream indexes. ACM Trans Database Syst (TODS) 29(4):752–788
Article Google Scholar
Gupta AK, Suciu D (2003) Stream processing of xpath queries with predicates. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM, pp 419–430
Han WS, Jiang H, Ho H, Li Q (2008) Streamtx: extracting tuples from streaming xml data. Proc VLDB Endow 1(1):289–300
Article Google Scholar
Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38
Article Google Scholar
Hochbaum DS, Shmoys DB (1987) Using dual approximation algorithms for scheduling problems theoretical and practical results. J ACM (JACM) 34(1):144–162
Article MathSciNet Google Scholar
Huang X, Si X, Yuan X, Wang C (2014) A dynamic load-balancing scheme for xpath queries parallelization in shared memory multi-core systems. J Comput 9(6):1436–1445
Article Google Scholar
Jiang H, Wang W, Lu H, Yu JX (2003) Holistic twig joins on indexed xml documents. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol 29, VLDB Endowment, pp 273–284
Josifovski V, Fontoura M, Barta A (2005) Querying xml streams. VLDB J 14(2):197–210
Article Google Scholar
Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, pp 25–36
Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. AcM sIGMoD Rec 40(4):11–20
Article Google Scholar
Lu J, Chen T, Ling TW (2004) Efficient processing of xml twig patterns with parent child edges: a look-ahead approach. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM, pp 533–542
Lu J, Ling TW, Chan CY, Chen T (2005) From region encoding to extended dewey: On efficient processing of xml twig pattern matching. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB Endowment, pp 193–204
Machdi I, Amagasa T, Kitagawa H (2009) Executing parallel twigstack algorithm on a multi-core system. In: Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services, ACM, pp 176–184
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330
Google Scholar
Miliaraki I, Koubarakis M (2012) Foxtrot: distributed structural and value xml filtering. ACM Trans Web (TWEB) 6(3):12
Google Scholar
Ogden P, Thomas D, Pietzuch P (2013) Scalable xml query processing using parallel pushdown transducers. Proc VLDB Endow 6(14):1738–1749
Article Google Scholar
Olteanu D (2007) Spex: streamed and progressive evaluation of XPath. IEEE Trans Knowl Data Eng 19(7):934–949
Article Google Scholar
Onizuka M (2003) Light-weight xpath processing of xml stream with deterministic automata. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, ACM, pp 342–349
Onizuka M (2010) Processing xpath queries with forward and downward axes over xml streams. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 27–38
Peng F, Chawathe SS (2005) Xsq: a streaming xpath engine. ACM Trans Database Syst (TODS) 30(2):577–623
Article Google Scholar
Schmidt A, Waas F, Kersten M, Carey MJ, Manolescu I, Busse R (2002) Xmark: a benchmark for xml data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, pp 974–985
Shnaiderman L, Shmueli O (2015) Multi-core processing of XML twig patterns. IEEE Trans Knowl Data Eng 27(4):1057–1070
Article Google Scholar
SyncRO Soft S. Oxygen xml editor. http://www.oxygenxml.com/
Talbot J, Yoo RM, Kozyrakis C (2011) Phoenix++: modular mapreduce for shared-memory systems. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, ACM, pp 9–16
Wu X, Theodoratos D (2013) A survey on xml streaming evaluation techniques. VLDB J 22(2):177–202
Article Google Scholar
Yoo RM, Romano A, Kozyrakis C (2009) Phoenix rebirth: scalable mapreduce on a large-scale shared-memory system. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE, pp 198–207
Zhang Y, Pan Y, Chiu K (2010) A parallel XPath engine based on concurrent NFA execution. In: Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems. IEEE, pp 314–321

Download references

Acknowledgments

This work was partly supported by two Grants (2015K000260 and B0101-16-2666) funded by the Ministry of Science, ICT and Future Planning and also supported by KAIST and KISTI, Korea.

Author information

Authors and Affiliations

School of Computing, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Korea
Soo-Hyung Kim & Yoon-Joon Lee
Division of Convergence Technology Research, KISTI, 245 Daehak-ro, Yuseong-gu, Daejeon, 34141, Korea
Kyong-Ha Lee

Authors

Soo-Hyung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kyong-Ha Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yoon-Joon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyong-Ha Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, SH., Lee, KH. & Lee, YJ. Multi-query processing of XML data streams on multicore. J Supercomput 73, 2339–2368 (2017). https://doi.org/10.1007/s11227-016-1919-0

Download citation

Published: 12 November 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11227-016-1919-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-query processing of XML data streams on multicore

Abstract

Access this article

Similar content being viewed by others

Parallelization of Permuting XML Compressors

A Research Survey on Large XML Data: Streaming, Selectivity Estimation and Parallelism

Automatic parallelization of XQuery programs on multi-core systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-query processing of XML data streams on multicore

Abstract

Access this article

Similar content being viewed by others

Parallelization of Permuting XML Compressors

A Research Survey on Large XML Data: Streaming, Selectivity Estimation and Parallelism

Automatic parallelization of XQuery programs on multi-core systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation