Abstract
For a range of major scientific computing challenges that span fundamental and applied science, the deployment of Big Data Applications on a large-scale system, such as an internal or external cloud, a cluster or even distributed public resources (“crowd computing”), needs to be offered with guarantees of predictable performance and utilization cost. Currently, however, this is not possible, because scientific communities lack the technology, both at the level of modelling and analytics, which identifies the key characteristics of BDAs and their impact on performance. There is also little data or simulations available that address the role of the system operation and infrastructure in defining overall performance. Our vision is to fill this gap by producing a deeper understanding of how to optimize the deployment of Big Data Applications on hybrid large-scale infrastructures. Our objective is the optimal deployment of BDAs that run on systems operating on large infrastructures, in order to achieve optimal performance, while taking into account running costs. We describe a methodology to achieve this vision. The methodology starts with the modeling and profiling of applications, as well as with the exploration of alternative systems for their execution, which are hybridization’s of cloud, cluster and crowd. It continues with the employment of predictions to create schemes for performance optimization with respect to cost limitations for system utilization. The schemes can accommodate execution by adapting, i.e. extend or change, the system.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jennings, B., Stadler, R.: Resource management in clouds: survey and research challenges. J. Netw. Syst. Manag. 23(3), 567–619 (2014). https://doi.org/10.1007/s10922-014-9307-7
Cuomo, A., Rak, M., Villano, U.: Performance prediction of cloud applications through benchmarking and simulation. Int. J. Comput. Sci. Eng. 11(1), 46–55 (2015)
Petcu, D., et al.: Architecturing a sky computing platform. In: Cezon, M., Wolfsthal, Y. (eds.) ServiceWave 2010. LNCS, vol. 6569, pp. 1–13. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22760-8_1
Li, A., Zong, X., Kandula, S., Yang, X., Zhang, M.: CloudProphet: towards application performance prediction in cloud. SIGCOMM-Comput. Commun. Rev. 41(4), 426 (2011)
Herodotou, H., Dong, F., Babu, S.: No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In: SoCC 2011 (2011). Article no: 18
DBSeer: resource and performance prediction for building a next generation database cloud. In: CIDR 2013 (2013)
DBSeer: pain-free database administration through workload intelligence. PVLDB 8(12), 2036–2047 (2015)
Zhang, Y., Wang, Z., Gao, B., Guo, C., Sun, W., Li, X.: An effective heuristic for on-line tenant placement problem in SaaS. In: ICWS, pp. 425–432 (2010)
Liu, Z., Hacigümüs, H., Moon, H.J., Chi, Y., Hsiung, W.-P.: PMAX: tenant placement in multitenant databases for profit maximization. In: EDBT 2013, pp. 442–453 (2013)
Curino, C., Jones, E.P.C., Madden, S., Balakrishnan, H.: Workload-aware database monitoring and consolidation. In: SIGMOD 2011, pp. 313–324 (2011)
Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: EDBT (2011)
Duggan, J., Papaemmanouil, O., Çetintemel, U., Upfal, E.: Contender: a resource modeling approach for concurrent query performance prediction. In: EDBT 2014, pp. 109–120 (2014)
Ruemmler, C., Wilkes, J.: An introduction to disk drive modeling. IEEE Comput. 27(3), 17–28 (1994)
Uysal, M., Alvarez, G.A., Merchant, A.: A modular analytical throughput model for modern disk arrays. In: MASCOTS (2001)
Anderson, E.: Simple table-based modeling of storage devices. Technical report, HP Labs (2001)
Wang, M., Au, K., Ailamaki, A., Brockwell, A., Faloutsos, C., Ganger, G.R.: Storage device performance prediction with CART models. In: MASCOTS (2004)
Chen, P., Patterson, D.A.: A new approach to I/O performance evaluation-self scaling I/O benchmarks, predicted I/O performance. In: SIGMETRICS (1993)
Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 196–205. Springer, Heidelberg (2005). https://doi.org/10.1007/11549468_24
Gulati, A., Kumar, C., Ahmad, I.: Storage workload characterization and consolidation in virtualized environments. In: VPACT (2009)
Kraft, S., Casale, G., Krishnamurthy, D., Greer, D., Kilpatrick, P.: Performance models of storage contention in cloud environments. Softw. Syst. Model. 12(4), 681–704 (2013). https://doi.org/10.1007/s10270-012-0227-2
Delimitrou, C., Sankar, S., Vaid, K., Kozyrakis, C.: Decoupling datacenter studies from access to large-scale applications: a modeling approach for storage workloads. In: IISWC (2011)
Potti, N., Patel, J.M.: DAQ: a new paradigm for approximate query processing. PVLDB 8(9), 898–909 (2015)
Fan, W., Geerts, F., Libkin, L.: On scale independence for querying big data. In: PODS (2014)
Cao, Y., Fan, W., Yu, W.: Bounded conjunctive queries. PVLDB 7(12), 1231–1242 (2014)
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB (2009)
Garofalakis, M.N., Gibbons, P.B.: Wavelet synopses with error guarantees. In: SIGMOD (2004)
Agarwal, S., et al.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: SIGMOD (2014)
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys (2013)
Chaudhuri, S., Kolaitis, P.G.: Can datalog be approximated? JCSS 55(2), 355–369 (1997)
Barcelo, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. SICOMP 43(3), 1085–1130 (2014)
Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT (2011)
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractability to polynomial time. PVLDB 3(1), 1161–1172 (2010)
http://www.cisco.com/c/en/us/products/cloud-systems-management/intercloud-fabric/index.html
https://reinvent.awsevents.com/files/sponsors/Logicworks_Hybrid_Cloud_Legacy_Applications_WP.pdf
Lo, N.-W., Liu, P.-Y.: An efficient resource allocation framework for cloud federations. J. Inf. Technol. Control 44(1) (2015)
Hassan, M.M., Alelaiwi, A., Alamri, A.: A dynamic and efficient coalition formation game in cloud federation for multimedia applications. In: GCA (2015)
Calatrava, A., Moltó, G., Romero, E., Caballer, M., de Alfonso, C.: Towards migratable elastic virtual clusters on hybrid clouds. In: IEEE CLOUD (2015)
Niu, Y., Luo, B., Liu, F., Liu, J., Li, B.: When hybrid cloud meets flash crowd: towards cost-effective service provisioning. In: IEEE INFOCOM (2015)
Rezgui, A., Rezgui, S.: A stochastic approach for virtual machine placement in volunteer cloud federations. In: IEEE IC2E (2014)
Pllana, S., Fahringer, T.: Performance prophet: a performance modeling and prediction tool for parallel and distributed programs. In: ICPP Workshops (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kantere, V. (2020). Processing Big Data Across Infrastructures. In: Nepal, S., Cao, W., Nasridinov, A., Bhuiyan, M.Z.A., Guo, X., Zhang, LJ. (eds) Big Data – BigData 2020. BIGDATA 2020. Lecture Notes in Computer Science(), vol 12402. Springer, Cham. https://doi.org/10.1007/978-3-030-59612-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-59612-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59611-8
Online ISBN: 978-3-030-59612-5
eBook Packages: Computer ScienceComputer Science (R0)