Near-optimal communication-time tradeoff in fault-tolerant computation of aggregate functions

Zhao, Yuda; Yu, Haifeng; Chen, Binbin

doi:10.1007/s00446-015-0254-7

Near-optimal communication-time tradeoff in fault-tolerant computation of aggregate functions

Published: 08 September 2015

Volume 29, pages 17–38, (2016)
Cite this article

Distributed Computing Aims and scope Submit manuscript

Yuda Zhao¹,
Haifeng Yu¹ &
Binbin Chen²

194 Accesses
2 Citations
Explore all metrics

Abstract

This paper considers the problem of computing general commutative and associative aggregate functions (such as Sum) over distributed inputs held by nodes in a distributed system, while tolerating failures. Specifically, there are N nodes in the system, and the topology among them is modeled as a general undirected graph. Whenever a node sends a message, the message is received by all of its neighbors in the graph. Each node has an input, and the goal is for a special root node (e.g., the base station in wireless sensor networks or the gateway node in wireless ad hoc networks) to learn a certain commutative and associate aggregate of all these inputs. All nodes in the system except the root node may experience crash failures, with the total number of edges incidental to failed nodes being upper bounded by f. The timing model is synchronous where protocols proceed in rounds. Within such a context, we focus on the following question:

Under any given constraint on time complexity, what is the lowest communication complexity, in terms of the number of bits sent (i.e., locally broadcast) by each node, needed for computing general commutative and associate aggregate functions?

This work, for the first time, reduces the gap between the upper bound and the lower bound for the above question from polynomial to polylog. To achieve this reduction, we present significant improvements over both the existing upper bounds and the existing lower bounds on the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fault-tolerant aggregation: Flow-Updating meets Mass-Distribution

Article 09 November 2016

A Topological Perspective on Distributed Network Algorithms

Crash-Tolerant Consensus in Directed Graph Revisited (Extended Abstract)

Notes

For example, if a node fails or gets partitioned from the root (due to the failure of other nodes) right before the Sum protocol starts, incorporating the node’s input into the final sum would not be possible.
The model in [4] slightly differs from the model in this paper. But the results there can still be trivially adapted to this paper. Such trivial adaptation will be rigorously described in Sect. 11.2.
We have actually proved an upper bound of \(O\left( \left( \frac{f}{b}\log N+\right. \right. \left. \left. \log N\right) \cdot \min (b,f,\log N)\right) \). But for clarity, this paper uses the simpler form of \({O\left( \frac{f}{b}\log ^2 N+\log ^2 N\right) }\) in most places. The main novelty in our lower bound is the \(\frac{f}{b\log b}\) term. The \(\frac{\log N}{\log b}\) term comes, in a relatively straightforward way, from applying the results in [7] to the output domain size of \(\varOmega (N)\).
We do not consider probabilistic failures (e.g., where each node fails i.i.d. with certain probability), which could be of separate interest but is beyond the scope of this paper.
Alternatively, one could define a result to be correct iff the result equals \(\diamond _{o\in s}o\) for some s where \(s_1\subseteq s \subseteq s_2\). All our theorems and proofs hold, without any modification, under such an alternative definition.
Throughout this paper, a node floods a certain message by first sending the message to its neighbors, and then the other nodes simply forward that message upon first receiving it.
Since B may have failed early on, we may not be able to actually get x. Nevertheless, one can achieve a similar functionality by using the maximum level information from B’s descendants. See Sect. 7 for details.
Note that the argument here relies on the fact that the Failed Parent Detection Phase is before the Failed Child Detection Phase.
Note that the argument here relies on the fact that the Failed Parent Detection Phase is before the Failed Child Detection Phase.
The cycle promise described here is called the “alternative form” of the cycle promise in [4].
The result was originally stated for functions, though it trivially applies to partial functions as well.

References

Bawa, M., Gionis, A., Garcia-Molina, H., Motwani, R.: The price of validity in dynamic networks. J. Comput. Syst. Sci. 73(3), 245–264 (2007)
Article MathSciNet MATH Google Scholar
Blokhuis, A.: On the sperner capacity of the cyclic triangle. J. Algebraic Comb. 2(2), 123–124 (1993)
Article MathSciNet MATH Google Scholar
Calderbank, A.R., Frankl, P., Graham, R.L., Li, W.-C.W., Shepp, L.A.: The sperner capacity of linear and nonlinear codes for the cyclic triangle. J. Algebraic Comb. 2(1), 31–48 (1993)
Article MathSciNet MATH Google Scholar
Chen, B., Yu, H., Zhao, Y., Gibbons, P.B.: The cost of fault tolerance in multi-party communication complexity. J. ACM 61(3), 19:1–19:64 (2014)
Considine, J., Li, F., Kollios, G., Byers, J.: Approximate aggregation techniques for sensor databases. In: ICDE (March 2004)
Frederickson, G.: Tradeoffs for selection in distributed networks. In: PODC (1983)
Impagliazzo, R., Williams, R.: Communication complexity with synchronized clocks. In: CCC (June 2010)
Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: FOCS (October 2003)
Kuhn, F., Locher, T., Wattenhofer. R.: Tight bounds for distributed selection. In: SPAA (2007)
Kuhn, F., Lynch, N., Oshman, R.: Distributed computation in dynamic graphs. In: STOC (2010)
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1996)
Book Google Scholar
Madden, S., Franklin, M., Hellerstein, J., Hong, W.: Tag: a tiny aggregation service for ad-hoc sensor networks. In: OSDI (December 2002)
Mosk-Aoyama, D., Shah, D.: Computing separable functions via gossip. In: PODC (July 2006)
Nath, S., Gibbons, P.B., Seshany, S., Anderson, Z.: Synopsis diffusion for robust aggregation in sensor networks. ACM Trans. Sens. Netw. 4(2), 7:1–7:40 (2008)
Newman, I.: Private versus common random bits in communication complexity. Inform. Process. Lett. 39(2), 67–71 (1991)
Article MathSciNet MATH Google Scholar
Patt-Shamir, B.: A note on efficient aggregate queries in sensor networks. In: PODC (2004)
Shrira, L., Francez, N., and Rodeh, M.: Distributed k-selection: from a sequential to a distributed algorithm. In: PODC (1983)

Download references

Acknowledgments

We thank Faith Ellen, the PODC 2014 anonymous reviewers, and the Distributed Computing anonymous reviewers for many helpful comments on this paper.

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Computing 1, 13 Computing Drive, Singapore, 117417, Republic of Singapore
Yuda Zhao & Haifeng Yu
Advanced Digital Sciences Center, 1 Fusionopolis Way, #08-10 Connexis North Tower, Singapore, 138632, Republic of Singapore
Binbin Chen

Authors

Yuda Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Binbin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haifeng Yu.

Additional information

A preliminary version of this work appeared in the Proceedings of the ACM Symposium on Principles of Distributed Computing (PODC), 2014. This work is partly supported by Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE2011-T2-2-042, and partly supported by the research grant for the Human Sixth Sense Programme at the Advanced Digital Sciences Center from Singapore’s Agency for Science, Technology and Research (A*STAR).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Y., Yu, H. & Chen, B. Near-optimal communication-time tradeoff in fault-tolerant computation of aggregate functions. Distrib. Comput. 29, 17–38 (2016). https://doi.org/10.1007/s00446-015-0254-7

Download citation

Received: 19 September 2014
Accepted: 29 August 2015
Published: 08 September 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s00446-015-0254-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Near-optimal communication-time tradeoff in fault-tolerant computation of aggregate functions

Abstract

Access this article

Similar content being viewed by others

Fault-tolerant aggregation: Flow-Updating meets Mass-Distribution

A Topological Perspective on Distributed Network Algorithms

Crash-Tolerant Consensus in Directed Graph Revisited (Extended Abstract)

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Near-optimal communication-time tradeoff in fault-tolerant computation of aggregate functions

Abstract

Access this article

Similar content being viewed by others

Fault-tolerant aggregation: Flow-Updating meets Mass-Distribution

A Topological Perspective on Distributed Network Algorithms

Crash-Tolerant Consensus in Directed Graph Revisited (Extended Abstract)

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation