Skip to main content

Processing Big Data Across Infrastructures

  • Conference paper
  • First Online:
  • 795 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12402))

Abstract

For a range of major scientific computing challenges that span fundamental and applied science, the deployment of Big Data Applications on a large-scale system, such as an internal or external cloud, a cluster or even distributed public resources (“crowd computing”), needs to be offered with guarantees of predictable performance and utilization cost. Currently, however, this is not possible, because scientific communities lack the technology, both at the level of modelling and analytics, which identifies the key characteristics of BDAs and their impact on performance. There is also little data or simulations available that address the role of the system operation and infrastructure in defining overall performance. Our vision is to fill this gap by producing a deeper understanding of how to optimize the deployment of Big Data Applications on hybrid large-scale infrastructures. Our objective is the optimal deployment of BDAs that run on systems operating on large infrastructures, in order to achieve optimal performance, while taking into account running costs. We describe a methodology to achieve this vision. The methodology starts with the modeling and profiling of applications, as well as with the exploration of alternative systems for their execution, which are hybridization’s of cloud, cluster and crowd. It continues with the employment of predictions to create schemes for performance optimization with respect to cost limitations for system utilization. The schemes can accommodate execution by adapting, i.e. extend or change, the system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Jennings, B., Stadler, R.: Resource management in clouds: survey and research challenges. J. Netw. Syst. Manag. 23(3), 567–619 (2014). https://doi.org/10.1007/s10922-014-9307-7

    Article  Google Scholar 

  2. Cuomo, A., Rak, M., Villano, U.: Performance prediction of cloud applications through benchmarking and simulation. Int. J. Comput. Sci. Eng. 11(1), 46–55 (2015)

    Google Scholar 

  3. Petcu, D., et al.: Architecturing a sky computing platform. In: Cezon, M., Wolfsthal, Y. (eds.) ServiceWave 2010. LNCS, vol. 6569, pp. 1–13. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22760-8_1

    Chapter  Google Scholar 

  4. Li, A., Zong, X., Kandula, S., Yang, X., Zhang, M.: CloudProphet: towards application performance prediction in cloud. SIGCOMM-Comput. Commun. Rev. 41(4), 426 (2011)

    Article  Google Scholar 

  5. Herodotou, H., Dong, F., Babu, S.: No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In: SoCC 2011 (2011). Article no: 18

    Google Scholar 

  6. DBSeer: resource and performance prediction for building a next generation database cloud. In: CIDR 2013 (2013)

    Google Scholar 

  7. DBSeer: pain-free database administration through workload intelligence. PVLDB 8(12), 2036–2047 (2015)

    Google Scholar 

  8. Zhang, Y., Wang, Z., Gao, B., Guo, C., Sun, W., Li, X.: An effective heuristic for on-line tenant placement problem in SaaS. In: ICWS, pp. 425–432 (2010)

    Google Scholar 

  9. Liu, Z., Hacigümüs, H., Moon, H.J., Chi, Y., Hsiung, W.-P.: PMAX: tenant placement in multitenant databases for profit maximization. In: EDBT 2013, pp. 442–453 (2013)

    Google Scholar 

  10. Curino, C., Jones, E.P.C., Madden, S., Balakrishnan, H.: Workload-aware database monitoring and consolidation. In: SIGMOD 2011, pp. 313–324 (2011)

    Google Scholar 

  11. Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: EDBT (2011)

    Google Scholar 

  12. Duggan, J., Papaemmanouil, O., Çetintemel, U., Upfal, E.: Contender: a resource modeling approach for concurrent query performance prediction. In: EDBT 2014, pp. 109–120 (2014)

    Google Scholar 

  13. Ruemmler, C., Wilkes, J.: An introduction to disk drive modeling. IEEE Comput. 27(3), 17–28 (1994)

    Article  Google Scholar 

  14. Uysal, M., Alvarez, G.A., Merchant, A.: A modular analytical throughput model for modern disk arrays. In: MASCOTS (2001)

    Google Scholar 

  15. Anderson, E.: Simple table-based modeling of storage devices. Technical report, HP Labs (2001)

    Google Scholar 

  16. Wang, M., Au, K., Ailamaki, A., Brockwell, A., Faloutsos, C., Ganger, G.R.: Storage device performance prediction with CART models. In: MASCOTS (2004)

    Google Scholar 

  17. Chen, P., Patterson, D.A.: A new approach to I/O performance evaluation-self scaling I/O benchmarks, predicted I/O performance. In: SIGMETRICS (1993)

    Google Scholar 

  18. Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 196–205. Springer, Heidelberg (2005). https://doi.org/10.1007/11549468_24

    Chapter  Google Scholar 

  19. Gulati, A., Kumar, C., Ahmad, I.: Storage workload characterization and consolidation in virtualized environments. In: VPACT (2009)

    Google Scholar 

  20. Kraft, S., Casale, G., Krishnamurthy, D., Greer, D., Kilpatrick, P.: Performance models of storage contention in cloud environments. Softw. Syst. Model. 12(4), 681–704 (2013). https://doi.org/10.1007/s10270-012-0227-2

    Article  Google Scholar 

  21. Delimitrou, C., Sankar, S., Vaid, K., Kozyrakis, C.: Decoupling datacenter studies from access to large-scale applications: a modeling approach for storage workloads. In: IISWC (2011)

    Google Scholar 

  22. Potti, N., Patel, J.M.: DAQ: a new paradigm for approximate query processing. PVLDB 8(9), 898–909 (2015)

    Google Scholar 

  23. Fan, W., Geerts, F., Libkin, L.: On scale independence for querying big data. In: PODS (2014)

    Google Scholar 

  24. Cao, Y., Fan, W., Yu, W.: Bounded conjunctive queries. PVLDB 7(12), 1231–1242 (2014)

    Google Scholar 

  25. Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB (2009)

    Google Scholar 

  26. Garofalakis, M.N., Gibbons, P.B.: Wavelet synopses with error guarantees. In: SIGMOD (2004)

    Google Scholar 

  27. Agarwal, S., et al.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: SIGMOD (2014)

    Google Scholar 

  28. Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: EuroSys (2013)

    Google Scholar 

  29. Chaudhuri, S., Kolaitis, P.G.: Can datalog be approximated? JCSS 55(2), 355–369 (1997)

    MathSciNet  MATH  Google Scholar 

  30. Barcelo, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. SICOMP 43(3), 1085–1130 (2014)

    Article  MathSciNet  Google Scholar 

  31. Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT (2011)

    Google Scholar 

  32. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractability to polynomial time. PVLDB 3(1), 1161–1172 (2010)

    Google Scholar 

  33. http://www.cisco.com/c/en/us/products/cloud-systems-management/intercloud-fabric/index.html

  34. http://www8.hp.com/us/en/cloud/helion-network-overview.html

  35. https://reinvent.awsevents.com/files/sponsors/Logicworks_Hybrid_Cloud_Legacy_Applications_WP.pdf

  36. https://technet.microsoft.com/en-us/library/jj899572.aspx

  37. Lo, N.-W., Liu, P.-Y.: An efficient resource allocation framework for cloud federations. J. Inf. Technol. Control 44(1) (2015)

    Google Scholar 

  38. Hassan, M.M., Alelaiwi, A., Alamri, A.: A dynamic and efficient coalition formation game in cloud federation for multimedia applications. In: GCA (2015)

    Google Scholar 

  39. Calatrava, A., Moltó, G., Romero, E., Caballer, M., de Alfonso, C.: Towards migratable elastic virtual clusters on hybrid clouds. In: IEEE CLOUD (2015)

    Google Scholar 

  40. Niu, Y., Luo, B., Liu, F., Liu, J., Li, B.: When hybrid cloud meets flash crowd: towards cost-effective service provisioning. In: IEEE INFOCOM (2015)

    Google Scholar 

  41. Rezgui, A., Rezgui, S.: A stochastic approach for virtual machine placement in volunteer cloud federations. In: IEEE IC2E (2014)

    Google Scholar 

  42. Pllana, S., Fahringer, T.: Performance prophet: a performance modeling and prediction tool for parallel and distributed programs. In: ICPP Workshops (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verena Kantere .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kantere, V. (2020). Processing Big Data Across Infrastructures. In: Nepal, S., Cao, W., Nasridinov, A., Bhuiyan, M.Z.A., Guo, X., Zhang, LJ. (eds) Big Data – BigData 2020. BIGDATA 2020. Lecture Notes in Computer Science(), vol 12402. Springer, Cham. https://doi.org/10.1007/978-3-030-59612-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59612-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59611-8

  • Online ISBN: 978-3-030-59612-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics