Skip to main content
Log in

Delay aware querying with Seaweed

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystem-based network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges are scale (103 to 109 endsystems) and endsystem unavailability. In such large systems, a significant fraction of endsystems and their data will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable. We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data. Seaweed is a scalable query infrastructure supporting incremental results, online in-network aggregation and completeness prediction. It is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure-resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed’s scalability against other approaches and also evaluate the Seaweed prototype running on a large-scale network simulator driven by real-world traces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aberer, K., Datta, A., Hauswirth, M., Schmidt, R.: Indexing data-oriented overlay networks. In: VLDB, pp. 685–696. Trondheim, Norway (2005)

  2. Avnur, R., Hellerstein, J.M.: Eddies: Continuously adaptive query processing. In: SIGMOD, pp. 261–272. Dallas, TX (2000)

  3. Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. In: SIGMOD, pp. 13–24. Baltimore, MD (2005)

  4. Bawa, M., Gionis, A., Garcia-Molina, H., Motwani, R.: The price of validity in dynamic networks. In: SIGMOD, pp. 515–526. Paris, France (2004)

  5. Bhagwan, R., Savage, S., Voelker, G.M.: Understanding availability. In: IPTPS, pp. 256–267 (2003)

  6. Bharambe, A.R., Agrawal, M., Seshan, S.: Mercury: supporting scalable multi-attribute range queries. In: SIGCOMM, pp. 353–366. Portland, OR (2004)

  7. Blake, C., Rodrigues, R.: High availability, scalable storage, dynamic peer networks: Pick two. In: HotOS-IX, pp. 1–6. Kauai, HA (2003)

  8. Bolosky, W., Douceur, J., Ely, D., Theimer, M.: Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs. In: SIGMETRICS, pp. 34–43. Santa Clara, CA (2000)

  9. Castro, M., Costa, M., Rowstron, A.: Performance and dependability of structured peer-to-peer overlays. In: DSN, pp. 9–18. Florence, Italy (2004)

  10. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for Internet databases. In: SIGMOD, pp. 379–390. Dallas, TX (2000)

  11. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.S.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: VLDB, pp. 876–887. Toronto, CN (2004)

  12. Dabek, F., Zhao, B.Y., Druschel, P., Kubiatowicz, J., Stoica, I.: Towards a common API for structured peer-to-peer overlays. In: IPTPS, pp. 33–44 (2003)

  13. Deshpande, A., Hellerstein, J.M.: Lifting the burden of history from adaptive query processing. In: VLDB, pp. 948–959. Toronto, CN (2004)

  14. Halevy, A.Y., Ashish, N., Bitton, D., Carey, M.J., Draper, D., Pollock, J., Rosenthal, A., Sikka, V.: Enterprise information integration: successes, challenges and controversies. In: SIGMOD, pp. 778–787. Baltimore, MD (2005)

  15. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD, pp. 171–182. Tucson, AZ (1997) doi:http://doi.acm.org/10.1145/253260.253291

  16. Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the Internet with PIER. In: VLDB, pp. 321–332. Berlin, Germany (2003)

  17. Ioannidis, Y.E., Poosala, V.: Histogram-based approximation of set-valued query-answers. In: VLDB, pp. 174–185. Edinburgh, UK (1999)

  18. Jagadish, H.V., Ooi, B.C., Vu, Q.H.: BATON: a balanced tree structure for peer-to-peer networks. In: VLDB, pp. 661–672. Trondheim, Norway (2005)

  19. Johnson, T., Krishna, P.: Lazy updates for distributed search structure. In: SIGMOD, pp. 337–346. Washington DC, USA (1993) doi:http://doi.acm.org/10.1145/170035.170085

  20. Litwin, W., Neimat, M.A., Schneider, D.A.: RP*: a family of order preserving scalable distributed data structures. In: VLDB, pp. 342–353. Santiago de Chile, Chile (1994)

  21. Lomet, D.B.: Replicated indexes for distributed data. In: PDIS, pp. 108–119. Miami Beach, FL (1996)

  22. Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing P2P file-sharing with an Internet-scale query processor. In: VLDB, pp. 432–443. Toronto, CN (2004) http://www.vldb.org/conf/2004/RS11P2.PDF

  23. Madden, S., Shah, M.A., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: SIGMOD, pp. 49–60. ACM, Madison, WI (2002)

  24. Mickens, J.W., Noble, B.D.: Exploiting availability prediction in distributed systems. In: NSDI, pp. 73–86. San Jose, CA (2006)

  25. Microsoft: Dr. Watson for Windows. http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/drwatson_overview.mspx (2006)

  26. Mortier, R., Isaacs, R., Barham, P.: Anemone: using end-systems as a rich network management platform. In: SIGCOMM MineNet, pp. 203–204. Philadelphia, PA (2005) doi:http://doi. acm.org/10.1145/1080173.1080184

  27. Mortier, R., Narayanan, D., Donnelly, A., Rowstron, A.: Seaweed: distributed scalable ad-hoc querying. In: NetDB Workshop. Atlanta, GA (2006)

  28. Rowstron, A., Druschel, P.: Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Middleware, pp. 329–350 (2001)

  29. Saroiu, S., Gummadi, K., Gribble, S.: A measurement study of peer-to-peer file sharing systems. In: MMCN. San Jose, CA (2002)

  30. Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: VLDB, pp. 333–344. Berlin, Germany (2003)

  31. Van Renesse, R., Birman, K., Vogels, W.: astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003) doi:http://doi.acm.org/10.1145/762483.762485

  32. Yalagandula, P., Dahlin, M.: A scalable distributed information management system. In: SIGCOMM, pp. 379–390. Portland, OR (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dushyanth Narayanan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narayanan, D., Donnelly, A., Mortier, R. et al. Delay aware querying with Seaweed. The VLDB Journal 17, 315–331 (2008). https://doi.org/10.1007/s00778-007-0060-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0060-3

Keywords

Navigation