skip to main content
10.1145/1367497.1367614acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Xml data dissemination using automata on top of structured overlay networks

Published: 21 April 2008 Publication History

Abstract

We present a novel approach for filtering XML documents using nondeterministic finite automata and distributed hash tables. Our approach differs architecturally from recent proposals that deal with distributed XML filtering; they assume an XML broker architecture, whereas our solution is built on top of distributed hash tables. The essence of our work is a distributed implementation of YFilter, a state-of-the-art automata-based XML filtering system on top of Chord. We experimentally evaluate our approach and demonstrate that our algorithms can scale to millions of XPath queries under various filtering scenarios, and also exhibit very good load balancing properties.

References

[1]
DBLP XML records. http://dblp.uni-trier.de/xml/.
[2]
IBM XML Generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.
[3]
XMark: An XML Benchmark Project. http://www.xml-benchmark.org/.
[4]
YFilter 1.0 release. http://yfilter.cs.umass.edu/code_release.htm.
[5]
M. Altinel and M. J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In VLDB 2000.
[6]
D. Barbosa, L. Mignet, and P. Veltri. Studying the XML Web: Gathering Statistics from an XML Sample. World Wide Web, 9(2):187--212, 2006.
[7]
A. R. Bharambe, M. Agrawal, and S. Seshan. Mercury: Supporting Scalable Multi-attribute Range Queries. In SIGCOMM 2004.
[8]
A. Bonifati, U. Matrangolo, A. Cuzzocrea, and M. Jain. XPath Lookup Queries in P2P Networks. In WIDM 2004.
[9]
N. Bruno, L. Gravano, N. Koudas, and D. Srivastava. Navigation- vs. Index-Based XML Multi-Query Processing. In ICDE 2003.
[10]
C. Y. Chan, P. Felber, M. N. Garofalakis, and R. Rastogi. Efficient Filtering of XML Documents with XPath Expressions. In ICDE 2002.
[11]
C. Y. Chan and Y. Ni. Efficient XML Data Dissemination with Piggybacking. In SIGMOD 2007.
[12]
R. Chand and P. A. Felber. A Scalable Protocol for Content-Based Routing in Overlay Networks. In NCA 2003.
[13]
J. Clark and S. J. DeRose. XML Path Language (XPath) Version 1.0. World Wide Web Consortium, Recommendation, November 1999.
[14]
M. P. Consens and T. Milo. Optimizing Queries on Files. In SIGMOD 1994.
[15]
Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. ACM TODS, 28(4):467--516, 2003.
[16]
Y. Diao, S. Rizvi, and M. J. Franklin. Towards an Internet-Scale XML Dissemination Service. In VLDB 2004.
[17]
P. Felber, C.-Y. Chan, M. Garofalakis, and R. Rastogi. Scalable Filtering of XML Data for Web Services. IEEE Internet Computing, 7(1):49--57, 2003.
[18]
D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, J. Carey, and A. Sundararajan. The BEA Streaming XQuery Processor. The VLDB Journal, 13(3):294--315, 2004.
[19]
L. Galanis, Y. Wang, S. Jeffery, and D. J. DeWitt. Locating Data Sources in Large Distributed Systems. In VLDB 2003.
[20]
X. Gong, W. Qian, Y. Yan, and A. Zhou. Bloom Filter-Based XML Packets Filtering for Millions of Path Queries. In ICDE 2005.
[21]
T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML Streams with Deterministic Automata and Stream Indexes. ACM Trans. Database Syst., 29(4):752--788, 2004.
[22]
A. Gupta, O. D. Sahin, D. Agrawal, and A. E. Abbadi. Meghdoot: Content-based publish/subscribe over P2P networks. In Middleware 2004.
[23]
A. K. Gupta and D. Suciu. Stream Processing of XPath Queries with Predicates. In SIGMOD 2003.
[24]
J. E. Hopcroft, R. Motwani, Rotwani, and J. D. Ullman. Introduction to Automata Theory, Languages and Computability. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000.
[25]
S. Hou and H.-A. Jacobsen. Predicate-based Filtering of XPath Expressions. In ICDE 2006.
[26]
G. Koloniari and E. Pitoura. Content-based Routing of Path Queries in Peer-to-Peer Systems. In EDBT 2004.
[27]
M. M. Moro, P. Bakalov, and V. J. Tsotras. Early Profile Pruning on XML-aware Publish/Subscribe Systems. In VLDB 2007.
[28]
F. Peng and S. S. Chawathe. XPath queries on streaming data. In SIGMOD 2003.
[29]
A. C. Snoeren, K. Conley, and D. K. Gifford. Mesh-Based Content Routing using XML. SOSP 2001, 35(5):160--173, 2001.
[30]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. In SIGCOMM 2001.
[31]
C. Tryfonopoulos, S. Idreos, and M. Koubarakis. Publish/Subscribe Functionality in IR Environments using Structured Overlay Networks. In SIGIR 2005.
[32]
H. Uchiyama, M. Onizuka, and T. Honishi. Distributed XML Stream Filtering System with High Scalability. In ICDE 2005.
[33]
A. Zhou, W. Qian, X. Gong, and M. Zhou. Sonnet: An Efficient Distributed Content-Based Dissemination Broker (Poster paper). In SIGMOD 2007.

Cited By

View all
  • (2016)A distributed selectivity-driven search strategy for semi-structured data over DHT-based networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.03.01593:C(10-29)Online publication date: 1-Jul-2016
  • (2016)LCA-based algorithms for efficiently processing multiple keyword queries over XML streamsData & Knowledge Engineering10.1016/j.datak.2016.03.001103:C(1-18)Online publication date: 1-May-2016
  • (2014)Tuning the continual flow pipeline architecture with virtual register renamingACM Transactions on Architecture and Code Optimization10.1145/257967511:1(1-27)Online publication date: 1-Feb-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automata
  2. load balancing
  3. structured overlay networks
  4. xml data dissemination

Qualifiers

  • Research-article

Conference

WWW '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)A distributed selectivity-driven search strategy for semi-structured data over DHT-based networksJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.03.01593:C(10-29)Online publication date: 1-Jul-2016
  • (2016)LCA-based algorithms for efficiently processing multiple keyword queries over XML streamsData & Knowledge Engineering10.1016/j.datak.2016.03.001103:C(1-18)Online publication date: 1-May-2016
  • (2014)Tuning the continual flow pipeline architecture with virtual register renamingACM Transactions on Architecture and Code Optimization10.1145/257967511:1(1-27)Online publication date: 1-Feb-2014
  • (2014)Distributed Large-Scale Information FilteringTransactions on Large-Scale Data- and Knowledge-Centered Systems XIII10.1007/978-3-662-45942-3_4(91-122)Online publication date: 5-Mar-2014
  • (2014)Distributed Large-Scale Information FilteringTransactions on Large-Scale Data- and Knowledge-Centered Systems XIII10.1007/978-3-642-54426-2_4(91-122)Online publication date: 5-Mar-2014
  • (2013)VoIP steganography and its Detection—A surveyACM Computing Surveys10.1145/2543581.254358746:2(1-21)Online publication date: 27-Dec-2013
  • (2013)Detecting changes in information diffusion patterns over social networksACM Transactions on Intelligent Systems and Technology10.1145/2483669.24836884:3(1-23)Online publication date: 1-Jul-2013
  • (2012)Navigating tomorrow's webACM Transactions on the Web10.1145/2344416.23444206:3(1-28)Online publication date: 2-Oct-2012
  • (2012)FoXtrotACM Transactions on the Web10.1145/2344416.23444196:3(1-34)Online publication date: 2-Oct-2012
  • (2012)Extracting information networks from the blogosphereACM Transactions on the Web10.1145/2344416.23444186:3(1-33)Online publication date: 2-Oct-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media