Skip to main content

Formalizing Compute-Aggregate Problems in Cloud Computing

  • Conference paper
  • First Online:
Structural Information and Communication Complexity (SIROCCO 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11085))

  • 361 Accesses

Abstract

Efficient representation of data aggregations is a fundamental problem in modern big data applications, where network topologies and deployed routing and transport mechanisms play a fundamental role to optimize desired objectives: cost, latency, and others. We study the design principles of routing and transport infrastructure and identify extra information that can be used to improve implementations of compute-aggregate tasks. We build a taxonomy of compute-aggregate services unifying aggregation design principles, propose algorithms for each class and analyze them.

This work was supported by the Russian Science Foundation grant 17-11-01276.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akidau, T., et al.: MillWheel: fault-tolerant stream processing at internet scale. PVLDB 6(11), 1033–1044 (2013)

    Google Scholar 

  2. Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: USENIX, pp. 281–296 (2010)

    Google Scholar 

  3. Byrka, J., Grandoni, F., Rothvoß, T., Sanità, L.: An improved LP-based approximation for Steiner tree. In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC 2010, pp. 583–592. ACM, New York (2010)

    Google Scholar 

  4. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)

    Google Scholar 

  5. Chang, F., et al.: Bigtable: a distributed storage system for structured data. In: OSDI, pp. 205–218 (2006)

    Google Scholar 

  6. Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating MapReduce performance using workload suites. In: MASCOTS, pp. 390–399 (2011)

    Google Scholar 

  7. Chen, Y., Griffith, R., Liu, J., Katz, R.H., Joseph, A.D.: Understanding TCP incast throughput collapse in datacenter networks. In: WREN, pp. 73–82 (2009)

    Google Scholar 

  8. Costa, P., Donnelly, A., Rowstron, A.I.T., O’Shea, G.: Camdoop: exploiting in-network aggregation for big data applications. In: NSDI, pp. 29–42 (2012)

    Google Scholar 

  9. Culhane, W., Kogan, K., Jayalath, C., Eugster, P.: Optimal communication structures for big data aggregation. In: INFOCOM, pp. 1643–1651 (2015)

    Google Scholar 

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Kaklamanis, C., Chlebk, M., Chlebkv, J.: Algorithmic aspects of global computing the steiner tree problem on graphs: inapproximability results. Theor. Comput. Sci. 406(3), 207–214 (2008)

    Article  Google Scholar 

  12. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  13. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

    Google Scholar 

  14. Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: SIGOPS, pp. 439–455 (2013)

    Google Scholar 

  15. van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003)

    Article  Google Scholar 

  16. Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003)

    Article  Google Scholar 

  17. White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)

    Google Scholar 

  18. Xiao, T., et al.: Nondeterminism in MapReduce considered harmful? An empirical study on non-commutative aggregators in MapReduce programs. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, pp. 44–53. ACM, New York (2014)

    Google Scholar 

  19. Yang, H., Dasdan, A., Hsiao, R., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: SIGMOD, pp. 1029–1040 (2007)

    Google Scholar 

  20. Yu, Y., et al.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)

    Google Scholar 

  21. Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)

    Google Scholar 

  22. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  23. Zhang, Y., Ansari, N.: On architecture design, congestion notification, TCP incast and power consumption in data centers. IEEE Commun. Surv. Tutor. 15(1), 39–64 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Nikolenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chuprikov, P., Davydow, A., Kogan, K., Nikolenko, S., Sirotkin, A. (2018). Formalizing Compute-Aggregate Problems in Cloud Computing. In: Lotker, Z., Patt-Shamir, B. (eds) Structural Information and Communication Complexity. SIROCCO 2018. Lecture Notes in Computer Science(), vol 11085. Springer, Cham. https://doi.org/10.1007/978-3-030-01325-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01325-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01324-0

  • Online ISBN: 978-3-030-01325-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics