Abstract
Efficient representation of data aggregations is a fundamental problem in modern big data applications, where network topologies and deployed routing and transport mechanisms play a fundamental role to optimize desired objectives: cost, latency, and others. We study the design principles of routing and transport infrastructure and identify extra information that can be used to improve implementations of compute-aggregate tasks. We build a taxonomy of compute-aggregate services unifying aggregation design principles, propose algorithms for each class and analyze them.
This work was supported by the Russian Science Foundation grant 17-11-01276.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akidau, T., et al.: MillWheel: fault-tolerant stream processing at internet scale. PVLDB 6(11), 1033–1044 (2013)
Al-Fares, M., Radhakrishnan, S., Raghavan, B., Huang, N., Vahdat, A.: Hedera: dynamic flow scheduling for data center networks. In: USENIX, pp. 281–296 (2010)
Byrka, J., Grandoni, F., Rothvoß, T., Sanità , L.: An improved LP-based approximation for Steiner tree. In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC 2010, pp. 583–592. ACM, New York (2010)
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)
Chang, F., et al.: Bigtable: a distributed storage system for structured data. In: OSDI, pp. 205–218 (2006)
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating MapReduce performance using workload suites. In: MASCOTS, pp. 390–399 (2011)
Chen, Y., Griffith, R., Liu, J., Katz, R.H., Joseph, A.D.: Understanding TCP incast throughput collapse in datacenter networks. In: WREN, pp. 73–82 (2009)
Costa, P., Donnelly, A., Rowstron, A.I.T., O’Shea, G.: Camdoop: exploiting in-network aggregation for big data applications. In: NSDI, pp. 29–42 (2012)
Culhane, W., Kogan, K., Jayalath, C., Eugster, P.: Optimal communication structures for big data aggregation. In: INFOCOM, pp. 1643–1651 (2015)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Kaklamanis, C., Chlebk, M., Chlebkv, J.: Algorithmic aspects of global computing the steiner tree problem on graphs: inapproximability results. Theor. Comput. Sci. 406(3), 207–214 (2008)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Oper. Syst. Rev. 44(2), 35–40 (2010)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: SIGOPS, pp. 439–455 (2013)
van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring, management, and data mining. ACM Trans. Comput. Syst. 21(2), 164–206 (2003)
Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003)
White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
Xiao, T., et al.: Nondeterminism in MapReduce considered harmful? An empirical study on non-commutative aggregators in MapReduce programs. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion 2014, pp. 44–53. ACM, New York (2014)
Yang, H., Dasdan, A., Hsiao, R., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: SIGMOD, pp. 1029–1040 (2007)
Yu, Y., et al.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI, pp. 1–14 (2008)
Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Zhang, Y., Ansari, N.: On architecture design, congestion notification, TCP incast and power consumption in data centers. IEEE Commun. Surv. Tutor. 15(1), 39–64 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Chuprikov, P., Davydow, A., Kogan, K., Nikolenko, S., Sirotkin, A. (2018). Formalizing Compute-Aggregate Problems in Cloud Computing. In: Lotker, Z., Patt-Shamir, B. (eds) Structural Information and Communication Complexity. SIROCCO 2018. Lecture Notes in Computer Science(), vol 11085. Springer, Cham. https://doi.org/10.1007/978-3-030-01325-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-01325-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01324-0
Online ISBN: 978-3-030-01325-7
eBook Packages: Computer ScienceComputer Science (R0)