Abstract
MapReduce supports the processing of large data sets in parallel. It has been shown that MapReduce is an example for the use of the bulk synchronous parallel (BSP) bridging model, a model for parallel computation on a fixed set of processors comprising alternating computation and communication phases. In this article we extend the normal execution of MapReduce from processing large finite data sets to processing stream queries with input data stream assumed to continue indefinitely. We classify stream queries into three classes, memoryless, semi-memoryless and memorable, and provide the model for each class using MapReduce based on BSP. In addition, as some stream queries require large amounts of computing sources, the BSP computation model is extended to a model with unbounded many agents, but preserving the barrier synchronization. A behavioral theory is developed for this model extending the behavioral theory of the BSP model. This comprises an axiomatization, the definition of Infinite-Agent BSP abstract state machines (Inf-Ag-BSP-ASM) and the proof that such ASMs capture the unbounded synchronized computations. Finally, we show how MapReduce processing can be further improved on grounds of the unbounded extension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The concatenation (\(\bigodot \)) used here is not same as the common concatenation denoted by \(\sum \). It works more like aggregation and its real functionality varies among different scenarios, but we still use the term concatenation to be consistent with [12].
- 2.
\(\varTheta \)-Class is the intersection of O-Class and \(\varOmega \)-Class which provides an asymptotically tight bound for functions.
- 3.
In theory, we can assume that the number is countably infinite, provided we restrict the model such that only finitely many of them will be simultaneously active.
- 4.
The definitions of the notations used here can be found in [9, Def. 2.2].
- 5.
Note that it still has to be ensured that an agent leaving the computation does this after completing its step. This, however, has to be ensured by the specification of the programs of the agents.
References
Blass, A., Gurevich, Y.: Abstract state machines capture parallel algorithms. ACM Trans. Comput. Logic 4(4), 578–651 (2003)
Blass, A., Gurevich, Y.: Abstract state machines capture parallel algorithms: correction and extension. ACM Trans. Comp. Logic 9(3), 1–32 (2008)
Börger, E., Schewe, K.-D.: Concurrent abstract state machines. Acta Inf. 53(5), 469–492 (2015). https://doi.org/10.1007/s00236-015-0249-7
Börger, E., Schewe, K.D.: A behavioural theory of recursive algorithms. Fundam. Inf. 177(1), 1–37 (2020)
Costa, V.G., Marín, M.: A parallel search engine with BSP. In: Third Latin American Web Congress (LA-Web 2005), pp. 259–268. IEEE Computer Society (2005). https://doi.org/10.1109/LAWEB.2005.7
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association (2004). http://dl.acm.org/citation.cfm?id=1251254.1251264
Dershowitz, N., Falkovich-Derzhavetz, E.: On the parallel computation thesis. Logic J. IGPL 24(3), 346–374 (2016). https://doi.org/10.1093/jigpal/jzw008
Ferrarotti, F., Schewe, K.D., Tec, L., Wang, Q.: A new thesis concerning synchronised parallel computing - simplified parallel ASM thesis. Theor. Comp. Sci. 649, 25–53 (2016). https://doi.org/10.1016/j.tcs.2016.08.013
Ferrarotti, F., González, S., Schewe, K.D.: BSP abstract state machines capture bulk synchronous parallel computations. Sci. Comput. Program. 184, 102319 (2019). https://doi.org/10.1016/j.scico.2019.102319
Gava, F., Pommereau, F., Guedj, M.: A BSP algorithm for on-the-fly checking CTL* formulas on security protocols. J. Supercomput. 69(2), 629–672 (2014). https://doi.org/10.1007/s11227-014-1099-8
Gurevich, Y.: Sequential abstract-state machines capture sequential algorithms. ACM Trans. Comp. Logic 1(1), 77–111 (2000). https://doi.org/10.1145/343369.343384
Gurevich, Y., Leinders, D., Van den Bussche, J.: A theory of stream queries. In: Arenas, M., Schwartzbach, M.I. (eds.) DBPL 2007. LNCS, vol. 4797, pp. 153–168. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75987-4_11
Inda, M.A., Bisseling, R.H.: A simple and efficient parallel FFT algorithm using the BSP model. Parallel Comput. 27(14), 1847–1878 (2001)
Pace, M.F.: BSP vs. MapReduce. In: Ali, H.H., et al. (eds.) Proceedings of the International Conference on Computational Science (ICCS 2012). Procedia Computer Science, vol. 9, pp. 246–255. Elsevier (2012)
Schewe, K.-D., Wang, Q.: A simplified parallel ASM thesis. In: Derrick, J., et al. (eds.) ABZ 2012. LNCS, vol. 7316, pp. 341–344. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30885-7_27
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990). https://doi.org/10.1145/79173.79181
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Z., He, S., Du, Y., González, S., Schewe, KD. (2021). Unbounded Barrier-Synchronized Concurrent ASMs for Effective MapReduce Processing on Streams. In: Raschke, A., Méry, D. (eds) Rigorous State-Based Methods. ABZ 2021. Lecture Notes in Computer Science(), vol 12709. Springer, Cham. https://doi.org/10.1007/978-3-030-77543-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-77543-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77542-1
Online ISBN: 978-3-030-77543-8
eBook Packages: Computer ScienceComputer Science (R0)