ABSTRACT
General-purpose streaming systems support diverse application domains with powerful and user-defined stream operators. Most general-purpose streaming systems have their own, non-XML, internal data representation. However, streaming input is often either a sequence of small XML documents, or a scan of a huge document. Prior work on XML streaming focuses on filtering, not transforming, XML, and does not describe how to integrate with a general-purpose streaming system. This paper describes how to integrate an XML transformer with a streaming system by designing a specification syntax that is both consistent with the existing system and familiar to XML users. After type-checking the specification, we compile it to an efficient automaton driven by SAX events. Our approach extends the underlying streaming system with XML support without changing its core architecture, and the same technique could be used for other extensions beyond XML.
- D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In Conference on Innovative Data Systems Research (GIDR), 2005.Google Scholar
- L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. SPC: A distributed, scalable platform for data mining. In Workshop on Data Mining Standards, Services and Platforms (DM-SSP), 2006. Google ScholarDigital Library
- A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. Journal on Very Large Data Bases (VLDB J.), 15(2), 2006. Google ScholarDigital Library
- C. Barton, P. Charles, D. Goyal, M. Raghavachari, M. Fontoura, and V. Josifovski. Streaming XPath processing with forward and backward axes. In International Conference on Data Engineering (ICDE), 2003.Google ScholarCross Ref
- P. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, and J. Teubner. MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In Demo at International Conference on Management of Data (SIGMOD-Demo), 2006. Google ScholarDigital Library
- D. Brownell. SAX2. O'Reilly, 2002. Google ScholarDigital Library
- N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal XML pattern matching. In International Conference on Management of Data (SIGMOD), 2002. Google ScholarDigital Library
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Conference on Innovative Data Systems Research (CIDR), 2003.Google Scholar
- J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In International Conference on Management of Data (SIGMOD), 2000. Google ScholarDigital Library
- A. K. Gupta and D. Suciu. Stream processing of XPath queries with predicates. In International Conference on Management of Data (SIGMOD), 2003. Google ScholarDigital Library
- M. Harren, M. Raghavachari, O. Shmueli, M. G. Burke, R. Bordawekar, I. Pechtchanski, and V. Sarkar. XJ: Facilitating XML processing in Java. In International World Wide Web Conferences (WWW), 2005. Google ScholarDigital Library
- M. Hentschel, L. Haas, and R. Miller. Just-in-time data integration in action. In Demo at Very Large Data Bases (VLDB-Demo), 2010. Google ScholarDigital Library
- M. Hirzel, H. Andrade, B. Gedik, V. Kumar, G. Losa, M. Mendell, H. Nasgaard, R. Soulé, and K.-L. Wu. SPL Streams Processing Language Specification. Technical Report RC24897, IBM Research, 2009.Google Scholar
- M. Hirzel and B. Gedik. Streams that compose using macros that oblige. In Workshop on Partial Evaluation and Program Manipulation (PERM), 2012. Google ScholarDigital Library
- H. Hosoya and B. C. Pierce. XDuce: A typed XML processing language. In International World Wide Web Conferences (WWW), 2000. Google ScholarDigital Library
- E. Meijer, B. Beckman, and G. M. Bierman. LINQ: Reconciling objects, relations, and XML in the .NET framework. In Industrial Sessions at the International Conference on Management of Data (SIGMOD), 2006. Google ScholarDigital Library
- F. Peng and S. S. Chawathe. XPath queries on streaming data. In International Conference on Management of Data (SIGMOD), 2003. Google ScholarDigital Library
- A. Schmidt, F. Waas, M. Kersten, M. J. Carey, I. Manolescu, and R. Busse. XMark: A benchmark for XML data management. In Very Large Data Bases (VLDB), 2002. Google ScholarDigital Library
- R. Soulé, M. Hirzel, R. Grimm, B. Gedik, H. Andrade, V. Kumar, and K.-L. Wu. A universal calculus for stream processing languages. In European Symposium on Programming (ESOP), 2010. Google ScholarDigital Library
Index Terms
- Extending a general-purpose streaming system for XML
Recommendations
A survey on XML streaming evaluation techniques
XML is currently the most popular format for exchanging and representing data on the web. It is used in various applications and for different types of data including structured, semistructured, and unstructured heterogeneous data types. During the ...
Comments