Abstract
We present IReS, the Intelligent Resource Scheduler that is able to abstractly describe, optimize and execute any batch analytics workflow with respect to a multi-objective policy. Relying on cost and performance models of the required tasks over the available platforms, IReS allocates distinct workflow parts to the most advantageous execution and/or storage engine among the available ones and decides on the exact amount of resources provisioned. Moreover, IReS efficiently adapts to the current cluster/engine conditions and recovers from failures by effectively monitoring the workflow execution in real-time. Our current prototype has been tested in a plethora of business driven and synthetic workflows, proving its potential of yielding significant gains in cost and performance compared to statically scheduled, single-engine executions. IReS incurs only marginal overhead to the workflow execution performance, managing to discover an approximate pareto-optimal set of execution plans within a few seconds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
ASAP (Adaptive Scalable Analytics Platform) envisions a unified execution framework for scalable data analytics. www.asap-fp7.eu/.
References
Apache Flink. https://flink.apache.org/
Apache Hadoop. http://hadoop.apache.org/
Apache Spark. https://spark.apache.org/
Cascading Lingual. www.cascading.org/projects/lingual/
Cloudera Distribution CDH 5.2.0. http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-5-2-0.html
Hortonworks Sandbox. http://hortonworks.com/products/hortonworks-sandbox/
monetdb. https://www.monetdb.org/
Presto. http://www.teradata.com/Presto
Running Databases on AWS. http://aws.amazon.com/running_databases/
The Infrastructure Behind Twitter: Scale. https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html
What is Facebook’s architecture? https://www.quora.com/What-is-Facebooks-architecture-6
Agrawal, D., et al.: Rheem: enabling multi-platform task execution. In: SIGMOD (2016)
Armbrust, M., et al.: SparkSQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394. ACM (2015)
Bharathi, S., et al.: Characterization of scientific workflows. In: Workshop on Workflows in Support of Large-Scale Science (2008)
Bugiotti, F., et al.: Invisible glue: scalable self-tuning multi-stores. In: CIDR (2015)
Chawathe, S., et al.: The TSIMMIS project: integration of heterogenous information sources. In: IPSJ, pp. 7–18 (1994)
Deb, K., et al.: A fast and elitist multiobjective genetic algorithm: NSGA-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Doka, K., Papailiou, N., Tsoumakos, D., Mantas, C., Koziris, N.: IReS: intelligent, multi-engine resource scheduler for big data analytics workflows. In: Proceedings of the 2015 ACM SIGMOD, pp. 1451–1456. ACM (2015)
Doka, K., et al.: Mix “n” match multi-engine analytics. In: Big data, pp. 194–203. IEEE (2016)
Duggan, J., et al.: The bigDAWG polystore system. ACM Sigmod Rec. 44(2), 11–16 (2015)
Giannakopoulos, I., Tsoumakos, D., Koziris, N.: A decision tree based approach towards adaptive profiling of cloud applications. In: IEEE Big Data (2017)
Gog, I., et al.: Musketeer: all for one, one for all in data processing systems. In: Eurosys, p. 2. ACM (2015)
Haynes, B., Cheung, A., Balazinska, M.: Pipegen: data pipe generator for hybrid analytics. arXiv:1605.01664 (2016)
Henrikson, J.: Completeness and total boundedness of the hausdorff metric. MIT Undergrad. J. Math. 1, 69–80 (1999)
Herodotou, H., et al.: Starfish: a self-tuning system for big data analytics. In: CIDR (2011)
Johnson, N., Near, J.P., Song, D.: Towards practical differential privacy for SQL queries. Vertica 1, 1000
Karpathiotakis, et al.: No data left behind: real-time insights from a complex data ecosystem. In: SoCC, pp. 108–120. ACM (2017)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)
Kolev, B., et al.: CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib. Parallel Databases 34, 1–41 (2015)
Lim, H., Herodotou, H., Babu, S.: Stubby: a transformation-based optimizer for mapreduce workflows. In: VLDB (2012)
Roth, M.T., Schwarz, P.M.: Don’t scrap it, wrap it! a wrapper architecture for legacy data sources. In: VLDB, vol. 97 (1997)
Sharma, B., Wood, T., Das, C.R.: HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers. In: ICDCS (2013)
Simitsis, A., et al.: HFMS: managing the lifecycle and complexity of hybrid analytic data flows. In: ICDE. IEEE (2013)
Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. IEEE TKDE 10(5), 808–823 (1998)
Tsoumakos, D., Mantas, C.: The case for multi-engine data analytics. In: an Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 406–415. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54420-0_40
Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: SoCC, p. 5. ACM (2013)
Wang, J., et al.: The myria big data management and analytics system and cloud services. In: CIDR (2017)
Zhang, Z., et al.: Automated profiling and resource management of pig programs for meeting service level objectives. In: ICAC, pp. 53–62. ACM (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Doka, K., Mytilinis, I., Papailiou, N., Giannakouris, V., Tsoumakos, D., Koziris, N. (2019). Multi-engine Analytics with IReS. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-24124-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24123-0
Online ISBN: 978-3-030-24124-7
eBook Packages: Computer ScienceComputer Science (R0)