Skip to main content
Log in

freeCycles - Efficient Multi-Cloud Computing Platform

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

The growing adoption of the MapReduce programming model increases the appeal of using Internet-wide computing platforms to run MapReduce applications on the Internet. However, current data distribution techniques, used in such platforms to distribute the high volumes of information which are needed to run MapReduce jobs, are naive, and therefore fail to offer an efficient approach for running MapReduce over the Internet. Thus, we propose a computing platform called freeCycles that runs MapReduce jobs over the Internet and provides two new main contributions: i) it improves data distribution, and ii) it increases intermediate data availability by replicating tasks or data through nodes in order to avoid losing intermediate data and consequently avoiding significant delays on the overall MapReduce execution time. We present the design and implementation of freeCycles, in which we use the BitTorrent protocol to distribute all data, along with an extensive set of performance results, which confirm the usefulness of the above mentioned contributions. Our system’s improved data distribution and availability makes it an ideal platform for large scale MapReduce jobs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.N.: Tarazu: Optimizing mapreduce on heterogeneous clusters. SIGARCH Comput. Archit. News 40(1), 61–74 (2012)

    Article  Google Scholar 

  2. Alexandrov, A.D., Ibel, M., Schauser, K.E., Scheiman, C.J.: Superweb: towards a global web-based parallel computing infrastructure. In: Parallel Processing Symposium, 1997. Proceedings., 11th International, pp 100–106 (1997)

  3. Anderson, D.P.: Boinc: a system for public-resource computing and storage. In: 2004. Proceedings. Fifth IEEE/ACM International Workshop on Grid Computing, pp 4–10 (2004)

  4. Anderson, D.P., Christensen, C., Allen, B.: Designing a runtime system for volunteer computing. In: SC 2006 Conference, Proceedings of the ACM/IEEE, pp 33–33 (2006)

  5. Anderson, D.P., Fedak, G.: The computational and storage potential of volunteer computing. In: 2006. CCGRID 06. Sixth IEEE International Symposium on Cluster Computing and the Grid, vol. 1, pp 73–80 (2006)

  6. Baratloo, A., Karaul, M., Kedem, Z.M., Wijckoff, P.: Charlotte: Metacomputing on the web. Futur. Gener. Comput. Syst. 15(5–6), 559–570 (1999)

    Article  Google Scholar 

  7. Bazinet, A.L., Cummings, M.P.: Subdividing long-running, variable-length analyses into short, fixed-length boinc workunits. J. Grid Comput. 14(3), 429–441 (2016)

    Article  Google Scholar 

  8. Bertis, V., Bolze, R., Desprez, F., Reed, K.: From dedicated grid to volunteer grid: Large scale execution of a bioinformatics application. J. Grid Comput. 7(4), 463 (2009)

    Article  Google Scholar 

  9. Binzenhöfer, A., Leibnitz, K.: Estimating churn in structured p2p networks. In: Managing Traffic Performance in Converged Networks, pp 630–641. Springer, Berlin (2007)

  10. Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Proj. Website 11, 21 (2007)

    Google Scholar 

  11. Bruno, R., Ferreira, P.: Scadamar: Scalable and data-efficient internet mapreduce. In: Proceedings of the 2Nd International Workshop on CrossCloud Systems, CCB’14, pp 2:1–2:6. ACM, New York (2014)

  12. Cardosa, M., Wang, C., Nangia, A., Chandra, A., Weissman, J.: Exploring mapreduce efficiency with highly-distributed data, In Proceedings of the Second International Workshop on MapReduce and its Applications, 27–34, ACM, New York (2011)

  13. Castro, M., Liskov, B., et al.: Practical byzantine fault tolerance. In: OSDI, vol. 99, pp 173–186 (1999)

  14. Chakravarti, A.J., Baumgartner, G., Lauria, M.: The organic grid: self-organizing computation on a peer-to-peer network. IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 35(3), 373–384 (2005)

    Article  Google Scholar 

  15. Cherkasova, L., Lee, J.: Fastreplica: Efficient large file distribution within content delivery networks. In: USENIX Symposium on Internet Technologies and Systems, Seattle (2003)

  16. Chowdhury, M., Zaharia, M., Ma, J., Jordan, M.I., Stoica, I.: Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput. Commun. Rev. 41(4), 98–109 (2011)

    Article  Google Scholar 

  17. Chun, B., Culler, D., Roscoe, T., Bavier, A., Peterson, L., Wawrzoniak, M., Bowman, M.: Planetlab: an overlay testbed for broad-coverage services. ACM SIGCOMM Comput. Commun. Rev. 33(3), 3–12 (2003)

    Article  Google Scholar 

  18. Costa, F., Veiga, L., Ferreira, P.: Internet-scale support for map-reduce processing. J. Internet Serv. Appl. 4(1), 1–17 (2013)

    Article  Google Scholar 

  19. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  20. Dinu, F., Ng, T.S.: Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, pp 187–198. ACM, New York (2012)

  21. Fedak, G., Germain, C., Neri, V., Cappello, F.: Xtremweb: a generic global computing system. In: 2001. Proceedings. First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp 582–587 (2001)

  22. Fedak, G., He, H., Cappello, F.: Bitdew: A data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Netw. Comput. Appl. 32(5), 961–975 (2009). Next Generation Content Networks

    Article  Google Scholar 

  23. Gentzsch, W., Girou, D., Kennedy, A., Lederer, H., Reetz, J., Riedel, M., Schott, A., Vanni, A., Vazquez, M., Wolfrat, J.: Deisa—distributed european infrastructure for supercomputing applications. J. Grid Comput. 9(2), 259–277 (2011)

    Article  Google Scholar 

  24. Georgatos, F., Gkamas, V., Ilias, A., Kouretis, G., Varvarigos, E.: A grid-enabled cpu scavenging architecture and a case study of its use in the greek school network. J. Grid Comput. 8(1), 61–75 (2010)

    Article  Google Scholar 

  25. Heckmann, O., Bock, A.: The edonkey 2000 protocol. Rapport technique, Multimedia Communications Lab, Darmstadt University of Technology, 13 (2002)

  26. Heien, E.M., Anderson, D.P., Hagihara, K.: Computing low latency batches with unreliable workers in volunteer computing environments. J. Grid Comput. 7(4), 501 (2009)

    Article  Google Scholar 

  27. Kailasam, S., Dhawalia, P., Balaji, S.J., Iyer, G., Dharanipragada, J.: Extending mapreduce across clouds with bstream. IEEE Trans. Cloud Comput. 2(3), 362–376 (2014)

    Article  Google Scholar 

  28. Ko, S.Y., Hoque, I., Cho, B., Gupta, I.: Making cloud intermediate data fault-tolerant. In: Proceedings of the 1st ACM Symposium on Cloud Computing, p 181–192. ACM, Berlin (2010)

  29. Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., et al.: Oceanstore: An architecture for global-scale persistent storage. ACM Sigplan Not. 35(11), 190–201 (2000)

    Article  Google Scholar 

  30. Langville, A.N., Meyer, C.D.: Google’s PageRank and beyond: the science of search engine rankings. Princeton University Press, Princeton (2011)

    MATH  Google Scholar 

  31. Li, P., Guo, S., Yu, S., Zhuang, W.: Cross-cloud mapreduce for big data. IEEE Trans. Cloud Comput. PP(99), 1–1 (2015)

    Google Scholar 

  32. Liang, J., Kumar, R., Ross, K.W.: The fasttrack overlay: A measurement study. Comput. Netw. 50(6), 842–858 (2006)

    Article  Google Scholar 

  33. Lin, H., Ma, X., Archuleta, J., Feng, W.-c., Gardner, M., Zhang, Z.: Moon: Mapreduce on opportunistic environments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp 95–106. ACM, New York (2010)

  34. Lo, V., Zappala, D., Zhou, D., Liu, Y., Zhao, S.: Cluster computing on the fly: P2p scheduling of idle cycles in the internet. In: Peer-to-Peer Systems III, pp 227–236. Springer, Berlin (2005)

  35. Marozzo, F., Talia, D., Trunfio, P.: Adapting mapreduce for dynamic environments using a peer-to-peer model. In: Proceedings of the 1st Workshop on Cloud Computing and its Applications (2008)

  36. Nguyen, T., Shi, W.: Improving resource efficiency in data centers using reputation-based resource selection. In: Green Computing Conference, 2010 International, pp 389–396, USA (2010)

  37. Pouwelse, J., Garbacki, P., Epema, D., Sips, H.: The bittorrent p2p file-sharing system: Measurements and analysis. In: Peer-to-Peer Systems IV, pp 205–216. Springer, Berlink (2005)

  38. Qureshi, M.B., Dehnavi, M.M., Min-Allah, N., Qureshi, M.S., Hussain, H., Rentifis, I., Tziritas, N., Loukopoulos, T., Khan, Samee U., Xu, C.-Z., Zomaya, A.Y.: Survey on grid resource allocation mechanisms. J. Grid Comput. 12(2), 399–441 (2014)

    Article  Google Scholar 

  39. Rasooli, A., Down, D.G.: Guidelines for selecting hadoop schedulers based on system heterogeneity. J. Grid Comput. 12(3), 499–519 (2014)

    Article  Google Scholar 

  40. Ripeanu, M.: Peer-to-peer architecture case study: Gnutella network. In: 2001. Proceedings. First International Conference on Peer-to-Peer Computing, pp 99–100. IEEE, USA (2001)

  41. Rood, B., Lewis, M.J.: Grid resource availability prediction-based scheduling and task replication. J. Grid Comput. 7(4), 479 (2009)

    Article  Google Scholar 

  42. Sarmenta, L.F.G., Hirano, S.: Bayanihan: building and studying web-based volunteer computing systems using java. Futur. Gener. Comput. Syst. 15(5–6), 675–686 (1999)

    Article  Google Scholar 

  43. Silberstein, M., Sharov, A., Geiger, D., Schuster, A.: Gridbot: execution of bags of tasks in multiple grids. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC’09, pp 11:1–11:12. ACM, New York (2009)

  44. Singh, S., Chana, I.: A survey on resource scheduling in cloud computing Issues and challenges. J. Grid Comput. 14(2), 217–264 (2016)

    Article  Google Scholar 

  45. Stutzbach, D., Rejaie, R.: Understanding churn in peer-to-peer networks, In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, 189–202, ACM, New York (2006)

  46. Tang, B., Moca, M., Chevalier, S., He, H., Fedak, G.: Towards mapreduce for desktop grid computing. In: 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), pp 193–200 (2010)

  47. Tang, B., Tang, M., Fedak, G., He, H.: Availability/network-aware mapreduce over the internet. Inf. Sci. 379, 94–111 (2017)

    Article  Google Scholar 

  48. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the condor experience. Concurr. Comput. Pract. Exper. 17(2-4), 323–356 (2005)

    Article  Google Scholar 

  49. Toth, D., Finkel, D.: Improving the productivity of volunteer computing by using the most effective task retrieval policies. J. Grid Comput. 7(4), 519 (2009)

    Article  Google Scholar 

  50. White, T.: O’Reilly (2012)

  51. Yang, S., Butt, A.R., Fang, X., Hu, Y.C., Midkiff, S.P.: A fair, secure and trustworthy peer-to-peer based cycle-sharing system. J. Grid Comput. 4(3), 265–286 (2006)

    Article  MATH  Google Scholar 

  52. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp 10–10 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Bruno.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bruno, R., Costa, F. & Ferreira, P. freeCycles - Efficient Multi-Cloud Computing Platform. J Grid Computing 15, 501–526 (2017). https://doi.org/10.1007/s10723-017-9414-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-017-9414-2

Keywords

Navigation