Skip to main content
Log in

Multi-query processing of XML data streams on multicore

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The multicore architecture has been the norm for all computing systems in recent years as it provides the CPU-level support of parallelism. However, existing algorithms for processing XML streams do not fully take advantage of the facility since they have not been devised to run in parallel. In this article, we propose several methods to parallelize the finite state automata (FSA)-based XML stream processing technique efficiently. We transform a large collection of XPath expressions into multiple FSA-based query indexes and then process XML streams in parallel by virtue of the index-level parallelism. Each core works only with its own query index so that no synchronization issue occurs while filtering XML streams with multiple path patterns given by users. We also present an in-memory MapReduce model that enables to process a large collection of twig pattern joins over XML streams simultaneously. Twig pattern joins in our approach are performed by multiple H/W threads in a shared and balanced way. Extensive experiments show that our algorithm outperforms conventional algorithms with an 8-core CPU by up to ten times for processing 10 million XPath expressions over XML streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Altınel M, Franklin MJ (2000) Efficient filtering of xml documents for selective dissemination of information. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt

  2. Barton C, Charles P, Goyal D, Raghavachari M, Fontoura M, Josifovski V (2003) Streaming XPath processing with forward and backward axes. In: Proceedings of the 19th International Conference on Data Engineering, 2003, IEEE, pp 455–466

  3. Bordawekar R, Lim L, Kementsietsidis A, Kok BWL (2010) Statistics-based parallelization of xpath queries in shared memory systems. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 159–170

  4. Bordawekar R, Lim L, Shmueli O (2009) Parallelization of xpath queries using multi-core processors: challenges and experiences. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, ACM, pp 180–191

  5. Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (1998) Extensible markup language (XML). World Wide Web Consortium Recommendation REC-xml-19980210. http://www.w3.org/TR/1998/REC-xml-19980210

  6. Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal xml pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, ACM, pp 310–321

  7. Robie J, Chamberlin D, Dyck M, Snelson J (2014) XML path language (XPath) 3.0. World Wide Web Consortium Recommendation. https://www.w3.org/TR/xpath-30/

  8. Chen S, Li HG, Tatemura J, Hsiung WP, Agrawal D, Candan KS (2006) Twig 2 stack: bottom-up processing of generalized-tree-pattern queries over xml documents. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB Endowment, pp 283–294

  9. Chen S, Li HG, Tatemura J, Hsiung WP, Agrawal D, Candan KS (2008) Scalable filtering of multiple generalized-tree-pattern queries over XML streams. IEEE Trans Knowl Data Eng 20(12):1627–1640

    Article  Google Scholar 

  10. Chen T, Lu J, Ling TW (2005) On boosting holism in xml twig pattern matching using structural indexing techniques. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ACM, pp 455–466

  11. Chen Y, Davidson SB, Zheng Y (2006) An efficient XPath query processor for XML streams. In: Proceedings of the 22nd International Conference on Data Engineering. IEEE, p 79

  12. Choi H, Lee KH, Lee YJ (2014) Parallel labeling of massive xml data with mapreduce. J Supercomput 67(2):408–437

    Article  Google Scholar 

  13. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  14. Diao Y, Altinel M, Franklin MJ, Zhang H, Fischer P (2003) Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans Database Syst (TODS) 28(4):467–516

    Article  Google Scholar 

  15. Feng J, Liu L, Li G, Li J, Sun Y (2010) An efficient parallel pathstack algorithm for processing XML twig queries on multi-core systems. In: Proceedings of the 15th International Conference on Database Systems for Advanced Applications, Springer, pp 277–291

  16. Fischer P (2013) XQuery: a lightweight, full-featured XQuery engine. http://mxquery.org/

  17. Gou G, Chirkova R (2007) Efficient algorithms for evaluating xpath over streams. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, ACM, pp 269–280

  18. Green TJ, Gupta A, Miklau G, Onizuka M, Suciu D (2004) Processing xml streams with deterministic automata and stream indexes. ACM Trans Database Syst (TODS) 29(4):752–788

    Article  Google Scholar 

  19. Gupta AK, Suciu D (2003) Stream processing of xpath queries with predicates. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM, pp 419–430

  20. Han WS, Jiang H, Ho H, Li Q (2008) Streamtx: extracting tuples from streaming xml data. Proc VLDB Endow 1(1):289–300

    Article  Google Scholar 

  21. Hill MD, Marty MR (2008) Amdahl’s law in the multicore era. Computer 41(7):33–38

    Article  Google Scholar 

  22. Hochbaum DS, Shmoys DB (1987) Using dual approximation algorithms for scheduling problems theoretical and practical results. J ACM (JACM) 34(1):144–162

    Article  MathSciNet  Google Scholar 

  23. Huang X, Si X, Yuan X, Wang C (2014) A dynamic load-balancing scheme for xpath queries parallelization in shared memory multi-core systems. J Comput 9(6):1436–1445

    Article  Google Scholar 

  24. Jiang H, Wang W, Lu H, Yu JX (2003) Holistic twig joins on indexed xml documents. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol 29, VLDB Endowment, pp 273–284

  25. Josifovski V, Fontoura M, Barta A (2005) Querying xml streams. VLDB J 14(2):197–210

    Article  Google Scholar 

  26. Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, pp 25–36

  27. Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. AcM sIGMoD Rec 40(4):11–20

    Article  Google Scholar 

  28. Lu J, Chen T, Ling TW (2004) Efficient processing of xml twig patterns with parent child edges: a look-ahead approach. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ACM, pp 533–542

  29. Lu J, Ling TW, Chan CY, Chen T (2005) From region encoding to extended dewey: On efficient processing of xml twig pattern matching. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB Endowment, pp 193–204

  30. Machdi I, Amagasa T, Kitagawa H (2009) Executing parallel twigstack algorithm on a multi-core system. In: Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services, ACM, pp 176–184

  31. Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  32. Miliaraki I, Koubarakis M (2012) Foxtrot: distributed structural and value xml filtering. ACM Trans Web (TWEB) 6(3):12

    Google Scholar 

  33. Ogden P, Thomas D, Pietzuch P (2013) Scalable xml query processing using parallel pushdown transducers. Proc VLDB Endow 6(14):1738–1749

    Article  Google Scholar 

  34. Olteanu D (2007) Spex: streamed and progressive evaluation of XPath. IEEE Trans Knowl Data Eng 19(7):934–949

    Article  Google Scholar 

  35. Onizuka M (2003) Light-weight xpath processing of xml stream with deterministic automata. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management, ACM, pp 342–349

  36. Onizuka M (2010) Processing xpath queries with forward and downward axes over xml streams. In: Proceedings of the 13th International Conference on Extending Database Technology, ACM, pp 27–38

  37. Peng F, Chawathe SS (2005) Xsq: a streaming xpath engine. ACM Trans Database Syst (TODS) 30(2):577–623

    Article  Google Scholar 

  38. Schmidt A, Waas F, Kersten M, Carey MJ, Manolescu I, Busse R (2002) Xmark: a benchmark for xml data management. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, pp 974–985

  39. Shnaiderman L, Shmueli O (2015) Multi-core processing of XML twig patterns. IEEE Trans Knowl Data Eng 27(4):1057–1070

    Article  Google Scholar 

  40. SyncRO Soft S. Oxygen xml editor. http://www.oxygenxml.com/

  41. Talbot J, Yoo RM, Kozyrakis C (2011) Phoenix++: modular mapreduce for shared-memory systems. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, ACM, pp 9–16

  42. Wu X, Theodoratos D (2013) A survey on xml streaming evaluation techniques. VLDB J 22(2):177–202

    Article  Google Scholar 

  43. Yoo RM, Romano A, Kozyrakis C (2009) Phoenix rebirth: scalable mapreduce on a large-scale shared-memory system. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE, pp 198–207

  44. Zhang Y, Pan Y, Chiu K (2010) A parallel XPath engine based on concurrent NFA execution. In: Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems. IEEE, pp 314–321

Download references

Acknowledgments

This work was partly supported by two Grants (2015K000260 and B0101-16-2666) funded by the Ministry of Science, ICT and Future Planning and also supported by KAIST and KISTI, Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyong-Ha Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, SH., Lee, KH. & Lee, YJ. Multi-query processing of XML data streams on multicore. J Supercomput 73, 2339–2368 (2017). https://doi.org/10.1007/s11227-016-1919-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1919-0

Keywords

Navigation