Multi-engine Analytics with IReS

Doka, Katerina; Mytilinis, Ioannis; Papailiou, Nikolaos; Giannakouris, Victor; Tsoumakos, Dimitrios; Koziris, Nectarios

doi:10.1007/978-3-030-24124-7_9

Katerina Doka⁹,
Ioannis Mytilinis⁹,
Nikolaos Papailiou⁹,
Victor Giannakouris⁹,
Dimitrios Tsoumakos¹⁰ &
…
Nectarios Koziris⁹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 337))

Included in the following conference series:

397 Accesses

Abstract

We present IReS, the Intelligent Resource Scheduler that is able to abstractly describe, optimize and execute any batch analytics workflow with respect to a multi-objective policy. Relying on cost and performance models of the required tasks over the available platforms, IReS allocates distinct workflow parts to the most advantageous execution and/or storage engine among the available ones and decides on the exact amount of resources provisioned. Moreover, IReS efficiently adapts to the current cluster/engine conditions and recovers from failures by effectively monitoring the workflow execution in real-time. Our current prototype has been tested in a plethora of business driven and synthetic workflows, proving its potential of yielding significant gains in cost and performance compared to statically scheduled, single-engine executions. IReS incurs only marginal overhead to the workflow execution performance, managing to discover an approximate pareto-optimal set of execution plans within a few seconds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/project-asap/IReS-Platform.
2.
ASAP (Adaptive Scalable Analytics Platform) envisions a unified execution framework for scalable data analytics. www.asap-fp7.eu/.

References

Apache Flink. https://flink.apache.org/
Apache Hadoop. http://hadoop.apache.org/
Apache Spark. https://spark.apache.org/
Cascading Lingual. www.cascading.org/projects/lingual/
Cloudera Distribution CDH 5.2.0. http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-5-2-0.html
Hortonworks Sandbox. http://hortonworks.com/products/hortonworks-sandbox/
Kitten. https://github.com/cloudera/kitten
monetdb. https://www.monetdb.org/
Presto. http://www.teradata.com/Presto
Running Databases on AWS. http://aws.amazon.com/running_databases/
The Infrastructure Behind Twitter: Scale. https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html
What is Facebook’s architecture? https://www.quora.com/What-is-Facebooks-architecture-6
Agrawal, D., et al.: Rheem: enabling multi-platform task execution. In: SIGMOD (2016)
Google Scholar
Armbrust, M., et al.: SparkSQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394. ACM (2015)
Google Scholar
Bharathi, S., et al.: Characterization of scientific workflows. In: Workshop on Workflows in Support of Large-Scale Science (2008)
Google Scholar
Bugiotti, F., et al.: Invisible glue: scalable self-tuning multi-stores. In: CIDR (2015)
Google Scholar
Chawathe, S., et al.: The TSIMMIS project: integration of heterogenous information sources. In: IPSJ, pp. 7–18 (1994)
Google Scholar
Deb, K., et al.: A fast and elitist multiobjective genetic algorithm: NSGA-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Article Google Scholar
Doka, K., Papailiou, N., Tsoumakos, D., Mantas, C., Koziris, N.: IReS: intelligent, multi-engine resource scheduler for big data analytics workflows. In: Proceedings of the 2015 ACM SIGMOD, pp. 1451–1456. ACM (2015)
Google Scholar
Doka, K., et al.: Mix “n” match multi-engine analytics. In: Big data, pp. 194–203. IEEE (2016)
Google Scholar
Duggan, J., et al.: The bigDAWG polystore system. ACM Sigmod Rec. 44(2), 11–16 (2015)
Article Google Scholar
Giannakopoulos, I., Tsoumakos, D., Koziris, N.: A decision tree based approach towards adaptive profiling of cloud applications. In: IEEE Big Data (2017)
Google Scholar
Gog, I., et al.: Musketeer: all for one, one for all in data processing systems. In: Eurosys, p. 2. ACM (2015)
Google Scholar
Haynes, B., Cheung, A., Balazinska, M.: Pipegen: data pipe generator for hybrid analytics. arXiv:1605.01664 (2016)
Henrikson, J.: Completeness and total boundedness of the hausdorff metric. MIT Undergrad. J. Math. 1, 69–80 (1999)
Google Scholar
Herodotou, H., et al.: Starfish: a self-tuning system for big data analytics. In: CIDR (2011)
Google Scholar
Johnson, N., Near, J.P., Song, D.: Towards practical differential privacy for SQL queries. Vertica 1, 1000
Google Scholar
Karpathiotakis, et al.: No data left behind: real-time insights from a complex data ecosystem. In: SoCC, pp. 108–120. ACM (2017)
Google Scholar
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)
Google Scholar
Kolev, B., et al.: CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib. Parallel Databases 34, 1–41 (2015)
Google Scholar
Lim, H., Herodotou, H., Babu, S.: Stubby: a transformation-based optimizer for mapreduce workflows. In: VLDB (2012)
Google Scholar
Roth, M.T., Schwarz, P.M.: Don’t scrap it, wrap it! a wrapper architecture for legacy data sources. In: VLDB, vol. 97 (1997)
Google Scholar
Sharma, B., Wood, T., Das, C.R.: HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers. In: ICDCS (2013)
Google Scholar
Simitsis, A., et al.: HFMS: managing the lifecycle and complexity of hybrid analytic data flows. In: ICDE. IEEE (2013)
Google Scholar
Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. IEEE TKDE 10(5), 808–823 (1998)
Google Scholar
Tsoumakos, D., Mantas, C.: The case for multi-engine data analytics. In: an Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 406–415. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54420-0_40
Chapter Google Scholar
Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: SoCC, p. 5. ACM (2013)
Google Scholar
Wang, J., et al.: The myria big data management and analytics system and cloud services. In: CIDR (2017)
Google Scholar
Zhang, Z., et al.: Automated profiling and resource management of pig programs for meeting service level objectives. In: ICAC, pp. 53–62. ACM (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Systems Laboratory, National Technical University of Athens, Athens, Greece
Katerina Doka, Ioannis Mytilinis, Nikolaos Papailiou, Victor Giannakouris & Nectarios Koziris
Department of Informatics, Ionian University, Corfu, Greece
Dimitrios Tsoumakos

Authors

Katerina Doka
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Mytilinis
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Papailiou
View author publications
You can also search for this author in PubMed Google Scholar
Victor Giannakouris
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Tsoumakos
View author publications
You can also search for this author in PubMed Google Scholar
Nectarios Koziris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katerina Doka .

Editor information

Editors and Affiliations

Teradata, Santa Clara, CA, USA
Malu Castellanos
University of Pittsburgh, Pittsburgh, PA, USA
Panos K. Chrysanthis
University of Pittsburgh, Pittsburgh, PA, USA
Konstantinos Pelechrinis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doka, K., Mytilinis, I., Papailiou, N., Giannakouris, V., Tsoumakos, D., Koziris, N. (2019). Multi-engine Analytics with IReS. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-24124-7_9
Published: 11 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24123-0
Online ISBN: 978-3-030-24124-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics