Delay aware querying with Seaweed

Narayanan, Dushyanth; Donnelly, Austin; Mortier, Richard; Rowstron, Antony

doi:10.1007/s00778-007-0060-3

Delay aware querying with Seaweed

Special Issue Paper
Published: 05 September 2007

Volume 17, pages 315–331, (2008)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Dushyanth Narayanan¹,
Austin Donnelly¹,
Richard Mortier¹ &
…
Antony Rowstron¹

101 Accesses
8 Citations
Explore all metrics

Abstract

Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystem-based network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges are scale (10³ to 10⁹ endsystems) and endsystem unavailability. In such large systems, a significant fraction of endsystems and their data will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable. We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data. Seaweed is a scalable query infrastructure supporting incremental results, online in-network aggregation and completeness prediction. It is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure-resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed’s scalability against other approaches and also evaluate the Seaweed prototype running on a large-scale network simulator driven by real-world traces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Aberer, K., Datta, A., Hauswirth, M., Schmidt, R.: Indexing data-oriented overlay networks. In: VLDB, pp. 685–696. Trondheim, Norway (2005)
Avnur, R., Hellerstein, J.M.: Eddies: Continuously adaptive query processing. In: SIGMOD, pp. 261–272. Dallas, TX (2000)
Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. In: SIGMOD, pp. 13–24. Baltimore, MD (2005)
Bawa, M., Gionis, A., Garcia-Molina, H., Motwani, R.: The price of validity in dynamic networks. In: SIGMOD, pp. 515–526. Paris, France (2004)
Bhagwan, R., Savage, S., Voelker, G.M.: Understanding availability. In: IPTPS, pp. 256–267 (2003)
Bharambe, A.R., Agrawal, M., Seshan, S.: Mercury: supporting scalable multi-attribute range queries. In: SIGCOMM, pp. 353–366. Portland, OR (2004)
Blake, C., Rodrigues, R.: High availability, scalable storage, dynamic peer networks: Pick two. In: HotOS-IX, pp. 1–6. Kauai, HA (2003)
Bolosky, W., Douceur, J., Ely, D., Theimer, M.: Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs. In: SIGMETRICS, pp. 34–43. Santa Clara, CA (2000)
Castro, M., Costa, M., Rowstron, A.: Performance and dependability of structured peer-to-peer overlays. In: DSN, pp. 9–18. Florence, Italy (2004)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for Internet databases. In: SIGMOD, pp. 379–390. Dallas, TX (2000)
Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB, pp. 876–887. Toronto, CN (2004)
Dabek, F., Zhao, B.Y., Druschel, P., Kubiatowicz, J., Stoica, I.: Towards a common API for structured peer-to-peer overlays. In: IPTPS, pp. 33–44 (2003)
Deshpande, A., Hellerstein, J.M.: Lifting the burden of history from adaptive query processing. In: VLDB, pp. 948–959. Toronto, CN (2004)
Halevy, A.Y., Ashish, N., Bitton, D., Carey, M.J., Draper, D., Pollock, J., Rosenthal, A., Sikka, V.: Enterprise information integration: successes, challenges and controversies. In: SIGMOD, pp. 778–787. Baltimore, MD (2005)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD, pp. 171–182. Tucson, AZ (1997) doi:http://doi.acm.org/10.1145/253260.253291
Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: VLDB, pp. 321–332. Berlin, Germany (2003)
Ioannidis, Y.E., Poosala, V.: Histogram-based approximation of set-valued query-answers. In: VLDB, pp. 174–185. Edinburgh, UK (1999)
Jagadish, H.V., Ooi, B.C., Vu, Q.H.: BATON: a balanced tree structure for peer-to-peer networks. In: VLDB, pp. 661–672. Trondheim, Norway (2005)
Johnson, T., Krishna, P.: Lazy updates for distributed search structure. In: SIGMOD, pp. 337–346. Washington DC, USA (1993) doi:http://doi.acm.org/10.1145/170035.170085
Litwin, W., Neimat, M.A., Schneider, D.A.: RP^*: a family of order preserving scalable distributed data structures. In: VLDB, pp. 342–353. Santiago de Chile, Chile (1994)
Lomet, D.B.: Replicated indexes for distributed data. In: PDIS, pp. 108–119. Miami Beach, FL (1996)
Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing P2P file-sharing with an Internet-scale query processor. In: VLDB, pp. 432–443. Toronto, CN (2004) http://www.vldb.org/conf/2004/RS11P2.PDF
Madden, S., Shah, M.A., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: SIGMOD, pp. 49–60. ACM, Madison, WI (2002)
Mickens, J.W., Noble, B.D.: Exploiting availability prediction in distributed systems. In: NSDI, pp. 73–86. San Jose, CA (2006)
Microsoft: Dr. Watson for Windows. http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/drwatson_overview.mspx (2006)
Mortier, R., Isaacs, R., Barham, P.: Anemone: using end-systems as a rich network management platform. In: SIGCOMM MineNet, pp. 203–204. Philadelphia, PA (2005) doi:http://doi. acm.org/10.1145/1080173.1080184
Mortier, R., Narayanan, D., Donnelly, A., Rowstron, A.: Seaweed: distributed scalable ad-hoc querying. In: NetDB Workshop. Atlanta, GA (2006)
Rowstron, A., Druschel, P.: Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Middleware, pp. 329–350 (2001)
Saroiu, S., Gummadi, K., Gribble, S.: A measurement study of peer-to-peer file sharing systems. In: MMCN. San Jose, CA (2002)
Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: VLDB, pp. 333–344. Berlin, Germany (2003)
Van Renesse, R., Birman, K., Vogels, W.: astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003) doi:http://doi.acm.org/10.1145/762483.762485
Yalagandula, P., Dahlin, M.: A scalable distributed information management system. In: SIGCOMM, pp. 379–390. Portland, OR (2004)

Download references

Author information

Authors and Affiliations

Microsoft Research, 7 JJ Thomson Avenue, Cambridge, CB3 0FB, UK
Dushyanth Narayanan, Austin Donnelly, Richard Mortier & Antony Rowstron

Authors

Dushyanth Narayanan
View author publications
You can also search for this author in PubMed Google Scholar
Austin Donnelly
View author publications
You can also search for this author in PubMed Google Scholar
Richard Mortier
View author publications
You can also search for this author in PubMed Google Scholar
Antony Rowstron
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dushyanth Narayanan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narayanan, D., Donnelly, A., Mortier, R. et al. Delay aware querying with Seaweed. The VLDB Journal 17, 315–331 (2008). https://doi.org/10.1007/s00778-007-0060-3

Download citation

Received: 15 February 2007
Revised: 09 May 2007
Accepted: 18 May 2007
Published: 05 September 2007
Issue Date: March 2008
DOI: https://doi.org/10.1007/s00778-007-0060-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Delay aware querying with Seaweed

Abstract

Access this article

Similar content being viewed by others

Querying Distributed Data Streams

An Evaluation of EpiChord in OverSim

SShare: a simulator for studying and evaluating decentralized SPARQL query processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Delay aware querying with Seaweed

Abstract

Access this article

Similar content being viewed by others

Querying Distributed Data Streams

An Evaluation of EpiChord in OverSim

SShare: a simulator for studying and evaluating decentralized SPARQL query processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation