Skip to main content

From Searching Text to Querying XML Streams

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2476))

Included in the following conference series:

Abstract

XML data is queried with XPath expressions, which are a limited form of regular expressions.New XML stream processing applications, such as content-based routing or selective dissemination of information, require thousands or millions of XPath expressions to be evaluated simultaneously on the incoming XML stream at a high, sustained rate.Conceptually, the XPath evaluation problem is analogous to the text search problem, in which one or several regular expressions need to be matched to a given text, but the number of regular expressions here is much larger, while the “text” is much shorter, since it corresponds to the depth of the XML stream.In this paper we examine techniques that have been proposed for XML stream processing, which are variations of either a non-deterministic or a deterministic finite automata (NFA and DFA).For the latter, we describe a series or theoretical results establishing lower and upper bounds on the number of DFA states for sets of XPath expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aho and M. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18:333–340, 1975.

    Article  MATH  MathSciNet  Google Scholar 

  2. M. Altinel and M. Franklin. Efficient filtering of XML documents for selective dissemination.In Proceedings of VLDB, pages 53–64, Cairo, Egipt, September 2000.

    Google Scholar 

  3. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems.In Proceedings of the ACM SIGART/SIGMOD Symposium on Principles of Database Systems, pages 1–16, June 2002.

    Google Scholar 

  4. C. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPath expressions.In Proceedings of the International Conference on Data Engineering, 2002.

    Google Scholar 

  5. V. Christophides, S.Abiteboul, S.Cluet, and M. Scholl. From structured documents to novel query facilities.In R. Snodgrass and M. Winslett, editors, Proceedings of 1994 ACM SIGMOD International Conference on Management of Data, pages 313–324, Minneapolis, Minnesota, May 1994.

    Google Scholar 

  6. T.H. Cormen, C. E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MI T Press, 1990.

    Google Scholar 

  7. R. Goldman and J. Widom. DataGuides: enabling query formulation and optimization in semistructured databases. In Proceedings of Very Large Data Bases, pages 436–445, September 1997.

    Google Scholar 

  8. G. Gonnet, R. Baeza-Yates, and T. Snider. Lexicographical indices for text: inverted files vs. PAT trees. In W.B. Frakes and R.A. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 66–82. Prentice-Hall, 1992.

    Google Scholar 

  9. G. Gonnet and F. Tompa. Mind your grammar: A new approach to modelling text. In Proceedings of 13th International Conference on Very Large Databases, pages 339–346, 1987.

    Google Scholar 

  10. T.J. Green, A. Gupta, M. Onizuka, and D. Suciu. XMLTK: an XML toolkit for scalable XML stream processing, 2002.manuscript.

    Google Scholar 

  11. T.J. Green, G. Miklau, M. Onizuka, and D. Suciu. Processing xml streams with deterministic automata and stream indexes, 2002. manuscript.

    Google Scholar 

  12. M. Gudgin, M. Hadley, J. Moreau, and H. Nielsen. SOAP version 1.2 part 1: Messaging framework, 2001. available from the W3C, http://www.w3.org/2000/xp/Group/.

  13. M. Gudgin, M. Hadley, J. Moreau, and H. Nielsen. SOAP version 1.2 part 2: Adjuncts, 2001.available from the W3C, http://www.w3.org/2000/xp/Group/.

  14. A. Gupta, A. Halevy, and D. Suciu. View selection for XML stream processing. In WebDB’2000, 2002.

    Google Scholar 

  15. J. Hopcroft and J. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979.

    Google Scholar 

  16. Z. Ives, A. Halevy, and D. Weld. An XML query engine for network-bound data. Unpublished, 2001.

    Google Scholar 

  17. H. Liefke and D. Suciu. XMill: an efficent compressor for XML data. In Proceedings of SIGMOD, pages 153–164, Dallas, TX, 2000.

    Google Scholar 

  18. M. Marcus, B. Santorini, and M.A. Marcinkiewicz. Building a large annotated corpus of English: the Penn Treenbak. Computational Linguistics, 19, 1993.

    Google Scholar 

  19. NASA’s astronomical data center. ADC XML resource page. http://xml.gsfc.nasa.gov/.

  20. G. Navarro and R. Baeza-Yates. Proximal nodes: a model to query document databases by content and structure. ACM Transactions on Information Systems, 15(4):400–435, October 1997.

    Google Scholar 

  21. B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML data on the web. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 437–448, Santa Barbara, 2001.

    Google Scholar 

  22. G. Rozenberg and A. Salomaa. Handbook of Formal Languages. Springer Verlag, 1997.

    Google Scholar 

  23. A. Salminen and F.W. Tompa. Pat expressions: An algebra for text search. In Papers in Computational Lexicography: COMPLEX’92, pages 309–332, 1992.

    Google Scholar 

  24. A. Salminen and F.W. Tompa. Pat expressions: An algebra for text search. Acta Linguistica Hungarica, 41(1-4):277–306, 1994.

    Google Scholar 

  25. A. Snoeren, K. Conley, and D. Gifford. Mesh-based content routing using XML. In Proceedings of the 18th Symposium on Operating Systems Principles, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Suciu, D. (2002). From Searching Text to Querying XML Streams. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-45735-6_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44158-8

  • Online ISBN: 978-3-540-45735-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics