Formalizing Compute-Aggregate Problems in Cloud Computing

Chuprikov, Pavel; Davydow, Alex; Kogan, Kirill; Nikolenko, Sergey; Sirotkin, Alexander

doi:10.1007/978-3-030-01325-7_31

Pavel Chuprikov^15,16,
Alex Davydow¹⁵,
Kirill Kogan¹⁶,
Sergey Nikolenko¹⁵ &
…
Alexander Sirotkin^15,17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11085))

Included in the following conference series:

International Colloquium on Structural Information and Communication Complexity

361 Accesses

Abstract

Efficient representation of data aggregations is a fundamental problem in modern big data applications, where network topologies and deployed routing and transport mechanisms play a fundamental role to optimize desired objectives: cost, latency, and others. We study the design principles of routing and transport infrastructure and identify extra information that can be used to improve implementations of compute-aggregate tasks. We build a taxonomy of compute-aggregate services unifying aggregation design principles, propose algorithms for each class and analyze them.

This work was supported by the Russian Science Foundation grant 17-11-01276.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akidau, T., et al.: MillWheel: fault-tolerant stream processing at internet scale. PVLDB 6(11), 1033–1044 (2013)
Google Scholar
Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: USENIX, pp. 281–296 (2010)
Google Scholar
Byrka, J., Grandoni, F., Rothvoß, T., Sanità, L.: An improved LP-based approximation for Steiner tree. In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC 2010, pp. 583–592. ACM, New York (2010)
Google Scholar
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)
Google Scholar
Chang, F., et al.: Bigtable: a distributed storage system for structured data. In: OSDI, pp. 205–218 (2006)
Google Scholar
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating MapReduce performance using workload suites. In: MASCOTS, pp. 390–399 (2011)
Google Scholar
Chen, Y., Griffith, R., Liu, J., Katz, R.H., Joseph, A.D.: Understanding TCP incast throughput collapse in datacenter networks. In: WREN, pp. 73–82 (2009)
Google Scholar
Costa, P., Donnelly, A., Rowstron, A.I.T., O’Shea, G.: Camdoop: exploiting in-network aggregation for big data applications. In: NSDI, pp. 29–42 (2012)
Google Scholar
Culhane, W., Kogan, K., Jayalath, C., Eugster, P.: Optimal communication structures for big data aggregation. In: INFOCOM, pp. 1643–1651 (2015)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Kaklamanis, C., Chlebk, M., Chlebkv, J.: Algorithmic aspects of global computing the steiner tree problem on graphs: inapproximability results. Theor. Comput. Sci. 406(3), 207–214 (2008)
Article Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Google Scholar
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: SIGOPS, pp. 439–455 (2013)
Google Scholar
van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003)
Article Google Scholar
Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003)
Article Google Scholar
White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
Google Scholar
Xiao, T., et al.: Nondeterminism in MapReduce considered harmful? An empirical study on non-commutative aggregators in MapReduce programs. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, pp. 44–53. ACM, New York (2014)
Google Scholar
Yang, H., Dasdan, A., Hsiao, R., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: SIGMOD, pp. 1029–1040 (2007)
Google Scholar
Yu, Y., et al.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)
Google Scholar
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)
Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Zhang, Y., Ansari, N.: On architecture design, congestion notification, TCP incast and power consumption in data centers. IEEE Commun. Surv. Tutor. 15(1), 39–64 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Steklov Institute of Mathematics at St. Petersburg, St. Petersburg, Russia
Pavel Chuprikov, Alex Davydow, Sergey Nikolenko & Alexander Sirotkin
IMDEA Networks Institute, Madrid, Spain
Pavel Chuprikov & Kirill Kogan
National Research University Higher School of Economics, St. Petersburg, Russia
Alexander Sirotkin

Authors

Pavel Chuprikov
View author publications
You can also search for this author in PubMed Google Scholar
Alex Davydow
View author publications
You can also search for this author in PubMed Google Scholar
Kirill Kogan
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Nikolenko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Sirotkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Nikolenko .

Editor information

Editors and Affiliations

Ben-Gurion University of the Negev, Beer-Sheva, Israel
Zvi Lotker
Tel Aviv University, Tel Aviv, Israel
Boaz Patt-Shamir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chuprikov, P., Davydow, A., Kogan, K., Nikolenko, S., Sirotkin, A. (2018). Formalizing Compute-Aggregate Problems in Cloud Computing. In: Lotker, Z., Patt-Shamir, B. (eds) Structural Information and Communication Complexity. SIROCCO 2018. Lecture Notes in Computer Science(), vol 11085. Springer, Cham. https://doi.org/10.1007/978-3-030-01325-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-01325-7_31
Published: 31 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01324-0
Online ISBN: 978-3-030-01325-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics