Abstract
Apache Spark is one of the best-known frameworks for executing big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications.
Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible—that is, it can be completed within a given time span. The analysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdeddaim, Y., Asarin, E., Maler, O.: Scheduling with timed automata. Theoret. Comput. Sci. 354(2), 272–300 (2006)
Alur, R., Dill, D.L.: A theory of timed automata. Theoret. Comput. Sci. 126(2), 183–235 (1994)
Baresi, L., Pourhashem Kallehbasti, M.M., Rossi, M.: How bit-vector logic can help improve the verification of LTL specifications over infinite domains. In: Proceedings of 31st Annual ACM Symposium on Applied Computing, pp. 1666–1673 (2016)
Behrmann, G., Larsen, K.G., Rasmussen, J.I.: Optimal scheduling using priced timed automata. SIGMETRICS Perform. Eval. Rev. 32(4), 34–40 (2005)
Bersani, M., Erascu, M., Marconi, F., Rossi, M.: DICE verification tool - final version. Technical report, DICE Consortium (2017). www.dice-h2020.eu
Bersani, M.M., Rossi, M., San Pietro, P.: A tool for deciding the satisfiability of continuous-time metric temporal logic. Acta Informatica 53(2), 171–206 (2016)
Bradley, S., Henderson, W., Kendall, D.: Using timed automata for response time analysis of distributed real-time systems. In: 24th IFAC/IFIP Workshop on Real-Time Programming, pp. 143–148 (1999)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of International World-Wide Web Conference (WWW), pp. 107–117 (1998)
Brito, A., Ardagna, D., Blanquer, I., Evangelinou, A., Barbierato, E., Gribaudo, M., Almeida, J., Couto, A.P., Braga, T.: D3.4 EUBra-BIGSEA QoS infrastructure services intermediate version. Technical report, 3 July 2017
Corbett, J.C.: Timing analysis of ada tasking programs. IEEE Trans. Softw. Eng. 22(7), 461–483 (1996)
Demri, S., D’Souza, D.: An automata-theoretic approach to constraint LTL. Inf. Comput. 205(3), 380–415 (2007)
Henzinger, T.A.: The theory of hybrid automata. In: Inan, M.K., Kurshan, R.P. (eds.) Verification of Digital and Hybrid Systems. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 170, pp. 265–292. Springer, Berlin (2000). https://doi.org/10.1007/978-3-642-59615-5_13
Krakora, J., Waszniowski, L., Pisa, P., Hanzalek, Z.: Timed automata approach to real time distributed system verification. In: Proceedings of IEEE International Workshop on Factory Communication Systems, pp. 407–410, September 2004
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297 (1967)
Marconi, F., Bersani, M.M., Erascu, M., Rossi, M.: Towards the formal verification of data-intensive applications through metric temporal logic. In: Ogata, K., Lawford, M., Liu, S. (eds.) ICFEM 2016. LNCS, vol. 10009, pp. 193–209. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47846-3_13
Palencia, J.C., Harbour, M.G.: Schedulability analysis for tasks with static and dynamic offsets. In: Proceedings of IEEE Real-Time Systems Symposium, pp. 26–37, December 1998
Perez, D., Bernardi, S., Merseguer, J.Z., Requeno, J.I., Casale, G., Zhu, L.: DICE simulation tools - final version. Deliverable. http://www.dice-h2020.eu/resources/
Acknowledgment
This work has been partially supported by the DICE project (Horizon 2020 project no. 644869) and by the GAUSS national research project (MIUR, PRIN 2015, Contract 2015KWREMX).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Marconi, F., Quattrocchi, G., Baresi, L., Bersani, M.M., Rossi, M. (2018). On the Timed Analysis of Big-Data Applications. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds) NASA Formal Methods. NFM 2018. Lecture Notes in Computer Science(), vol 10811. Springer, Cham. https://doi.org/10.1007/978-3-319-77935-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-77935-5_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77934-8
Online ISBN: 978-3-319-77935-5
eBook Packages: Computer ScienceComputer Science (R0)