Abstract
The multicore architecture has been the norm for all computing systems in recent years as it provides the CPU-level support of parallelism. However, existing algorithms for processing XML streams do not fully take advantage of the facility since they have not been devised to run in parallel. In this article, we propose several methods to parallelize the finite state automata (FSA)-based XML stream processing technique efficiently. We transform a large collection of XPath expressions into multiple FSA-based query indexes and then process XML streams in parallel by virtue of the index-level parallelism. Each core works only with its own query index so that no synchronization issue occurs while filtering XML streams with multiple path patterns given by users. We also present an in-memory MapReduce model that enables to process a large collection of twig pattern joins over XML streams simultaneously. Twig pattern joins in our approach are performed by multiple H/W threads in a shared and balanced way. Extensive experiments show that our algorithm outperforms conventional algorithms with an 8-core CPU by up to ten times for processing 10 million XPath expressions over XML streams.

















Similar content being viewed by others
References
Altınel M, Franklin MJ (2000) Efficient filtering of xml documents for selective dissemination of information. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt
Barton C, Charles P, Goyal D, Raghavachari M, Fontoura M, Josifovski V (2003) Streaming XPath processing with forward and backward axes. In: Proceedings of the 19th International Conference on Data Engineering, 2003, IEEE, pp 455–466
Bordawekar R, Lim L, Kementsietsidis A, Kok BWL (2010) Statistics-based parallelization of xpath queries in shared memory systems. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 159–170
Bordawekar R, Lim L, Shmueli O (2009) Parallelization of xpath queries using multi-core processors: challenges and experiences. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, ACM, pp 180–191
Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (1998) Extensible markup language (XML). World Wide Web Consortium Recommendation REC-xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210
Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal xml pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, ACM, pp 310–321
Robie J, Chamberlin D, Dyck M, Snelson J (2014) XML path language (XPath) 3.0. World Wide Web Consortium Recommendation. https://www.w3.org/TR/xpath-30/
Chen S, Li HG, Tatemura J, Hsiung WP, Agrawal D, Candan KS (2006) Twig 2 stack: bottom-up processing of generalized-tree-pattern queries over xml documents. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB Endowment, pp 283–294
Chen S, Li HG, Tatemura J, Hsiung WP, Agrawal D, Candan KS (2008) Scalable filtering of multiple generalized-tree-pattern queries over XML streams. IEEE Trans Knowl Data Eng 20(12):1627–1640
Chen T, Lu J, Ling TW (2005) On boosting holism in xml twig pattern matching using structural indexing techniques. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM, pp 455–466
Chen Y, Davidson SB, Zheng Y (2006) An efficient XPath query processor for XML streams. In: Proceedings of the 22nd International Conference on Data Engineering. IEEE, p 79
Choi H, Lee KH, Lee YJ (2014) Parallel labeling of massive xml data with mapreduce. J Supercomput 67(2):408–437
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Diao Y, Altinel M, Franklin MJ, Zhang H, Fischer P (2003) Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans Database Syst (TODS) 28(4):467–516
Feng J, Liu L, Li G, Li J, Sun Y (2010) An efficient parallel pathstack algorithm for processing XML twig queries on multi-core systems. In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications, Springer, pp 277–291
Fischer P (2013) XQuery: a lightweight, full-featured XQuery engine. http://mxquery.org/
Gou G, Chirkova R (2007) Efficient algorithms for evaluating xpath over streams. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, ACM, pp 269–280
Green TJ, Gupta A, Miklau G, Onizuka M, Suciu D (2004) Processing xml streams with deterministic automata and stream indexes. ACM Trans Database Syst (TODS) 29(4):752–788
Gupta AK, Suciu D (2003) Stream processing of xpath queries with predicates. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM, pp 419–430
Han WS, Jiang H, Ho H, Li Q (2008) Streamtx: extracting tuples from streaming xml data. Proc VLDB Endow 1(1):289–300
Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38
Hochbaum DS, Shmoys DB (1987) Using dual approximation algorithms for scheduling problems theoretical and practical results. J ACM (JACM) 34(1):144–162
Huang X, Si X, Yuan X, Wang C (2014) A dynamic load-balancing scheme for xpath queries parallelization in shared memory multi-core systems. J Comput 9(6):1436–1445
Jiang H, Wang W, Lu H, Yu JX (2003) Holistic twig joins on indexed xml documents. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol 29, VLDB Endowment, pp 273–284
Josifovski V, Fontoura M, Barta A (2005) Querying xml streams. VLDB J 14(2):197–210
Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, pp 25–36
Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. AcM sIGMoD Rec 40(4):11–20
Lu J, Chen T, Ling TW (2004) Efficient processing of xml twig patterns with parent child edges: a look-ahead approach. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM, pp 533–542
Lu J, Ling TW, Chan CY, Chen T (2005) From region encoding to extended dewey: On efficient processing of xml twig pattern matching. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB Endowment, pp 193–204
Machdi I, Amagasa T, Kitagawa H (2009) Executing parallel twigstack algorithm on a multi-core system. In: Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services, ACM, pp 176–184
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330
Miliaraki I, Koubarakis M (2012) Foxtrot: distributed structural and value xml filtering. ACM Trans Web (TWEB) 6(3):12
Ogden P, Thomas D, Pietzuch P (2013) Scalable xml query processing using parallel pushdown transducers. Proc VLDB Endow 6(14):1738–1749
Olteanu D (2007) Spex: streamed and progressive evaluation of XPath. IEEE Trans Knowl Data Eng 19(7):934–949
Onizuka M (2003) Light-weight xpath processing of xml stream with deterministic automata. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, ACM, pp 342–349
Onizuka M (2010) Processing xpath queries with forward and downward axes over xml streams. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 27–38
Peng F, Chawathe SS (2005) Xsq: a streaming xpath engine. ACM Trans Database Syst (TODS) 30(2):577–623
Schmidt A, Waas F, Kersten M, Carey MJ, Manolescu I, Busse R (2002) Xmark: a benchmark for xml data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, pp 974–985
Shnaiderman L, Shmueli O (2015) Multi-core processing of XML twig patterns. IEEE Trans Knowl Data Eng 27(4):1057–1070
SyncRO Soft S. Oxygen xml editor. http://www.oxygenxml.com/
Talbot J, Yoo RM, Kozyrakis C (2011) Phoenix++: modular mapreduce for shared-memory systems. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, ACM, pp 9–16
Wu X, Theodoratos D (2013) A survey on xml streaming evaluation techniques. VLDB J 22(2):177–202
Yoo RM, Romano A, Kozyrakis C (2009) Phoenix rebirth: scalable mapreduce on a large-scale shared-memory system. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE, pp 198–207
Zhang Y, Pan Y, Chiu K (2010) A parallel XPath engine based on concurrent NFA execution. In: Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems. IEEE, pp 314–321
Acknowledgments
This work was partly supported by two Grants (2015K000260 and B0101-16-2666) funded by the Ministry of Science, ICT and Future Planning and also supported by KAIST and KISTI, Korea.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kim, SH., Lee, KH. & Lee, YJ. Multi-query processing of XML data streams on multicore. J Supercomput 73, 2339–2368 (2017). https://doi.org/10.1007/s11227-016-1919-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1919-0