skip to main content
10.1145/2390021.2390026acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
abstract

Large scale data analytics on clouds

Published:29 October 2012Publication History

ABSTRACT

We summarize important overall issues affecting use of clouds to support Data Science. We describe the mapping of different applications to HPCC and Cloud systems and the architecture that support data analytics that is interoperable between these architectures.

References

  1. Geoffrey Fox, Tony Hey, and Anne Trefethen, Where does all the data come from?, Chapter in Data Intensive Science. Terence Critchlow and Kerstin Kleese Van Dam, Editors. 2011. http://grids.ucs.indiana.edu/ptliupages/publications/Where%20does%20all%20the%20data%20come%20from%20v7.pdf.Google ScholarGoogle Scholar
  2. IDC. Cloud Computing's Role in Job Creation. 2012 {accessed 2012 March 6}; Sponsored by Microsoft Available from: http://www.microsoft.com/presspass/download/features/2012/IDC_Cloud_jobs_White_Paper.pdf.Google ScholarGoogle Scholar
  3. Cloud Computing to Bring 2.4 Million New Jobs in Europe by 2015. 2011 {accessed 2011 March 6}; Available from: http://www.eweek.com/c/a/Cloud-Computing/Cloud-Computing-to-Bring-24-Million-New-Jobs-in-Europe-by-2015-108084/.Google ScholarGoogle Scholar
  4. James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and A. H. Byers. Big data: The next frontier for innovation, competition, and productivity. 2011 {accessed 2012 August 23}; McKinsey Global Institute Available from: http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation.Google ScholarGoogle Scholar
  5. Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM, 2008. 51(1): p. 107--113. DOI: http://doi.acm.org/10.1145/1327452.1327492 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fox, G. C., R. D. Williams, and P. C. Messina, Parallel computing works! 1994: Morgan Kaufmann Publishers, Inc. http://www.old-npac.org/copywrite/pcw/node278.html#SECTION001440000000000000000Google ScholarGoogle Scholar
  7. Geoffrey C. Fox, Data intensive applications on clouds, in Proceedings of the second international workshop on Data intensive computing in the clouds. 2011, ACM. Seattle, Washington, USA. pages. 1--2. DOI: 10.1145/2087522.2087524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jaliya Ekanayake, Thilina Gunarathne, Judy Qiu, Geoffrey Fox, Scott Beason, Jong Youl Choi, Yang Ruan, Seung-Hee Bae, and Hui Li, Applicability of DryadLINQ to Scientific Applications. January 30, 2010, Community Grids Laboratory, Indiana University. http://grids.ucs.indiana.edu/ptliupages/publications/DryadReport.pdf.Google ScholarGoogle Scholar
  9. Judy Qiu, Jaliya Ekanayake, Thilina Gunarathne, Jong Youl Choi, Seung-Hee Bae, Yang Ruan, Saliya Ekanayake, Stephen Wu, Scott Beason, Geoffrey Fox, Mina Rho, and H. Tang, Data Intensive Computing for Bioinformatics. December 29, 2009. http://grids.ucs.indiana.edu/ptliupages/publications/DataIntensiveComputing_BookChapter.pdf.Google ScholarGoogle Scholar
  10. Kai Hwang, Geoffrey Fox, and Jack Dongarra, Distributed and Cloud Computing : from Parallel Processing to The Internet of Things. 2011: Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thilina Gunarathne, Bingjing Zhang, Tak-Lon Wu, and Judy Qiu, Scalable Parallel Computing on Clouds Using Twister4Azure Iterative MapReduce Future Generation Computer Systems 2012. To be published. http://grids.ucs.indiana.edu/ptliupages/publications/Scalable_Parallel_Computing_on_Clouds_Using_Twister4Azure_Iterative_MapReduce_cr_submit.pdfGoogle ScholarGoogle Scholar
  12. Judy Qiu, Thilina Gunarathne, and Geoffrey Fox, Classical and Iterative MapReduce on Azure, in Cloud Futures 2011 workshop. June 2-3, 2011. Microsoft Conference Center Building 33 Redmond, Washington United States. http://grids.ucs.indiana.edu/ptliupages/presentations/Twister4azure_June2-2011.pptx.Google ScholarGoogle Scholar
  13. Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst, HaLoop: Efficient Iterative Data Processing on Large Clusters, in The 36th International Conference on Very Large Data Bases. September 13-17, 2010, VLDB Endowment: Vol. 3. Singapore. http://www.ics.uci.edu/~yingyib/papers/HaLoop_camera_ready.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. SALSA Group. Iterative MapReduce. 2010 {accessed 2010 November 7}; Twister Home Page Available from: http://www.iterativemapreduce.org/.Google ScholarGoogle Scholar
  15. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. Bae, J. Qiu, and G. Fox, Twister: A Runtime for iterative MapReduce, in Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference June 20-25, 2010. 2010, ACM. Chicago, Illinois. http://grids.ucs.indiana.edu/ptliupages/publications/hpdc-camera-ready-submission.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica, Spark: Cluster Computing with Working Sets, in 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '10). June 22, 2010. Boston. http://www.cs.berkeley.edu/~franklin/Papers/hotcloud.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) project. {accessed 2012 September 6}; Available from: http://icl.cs.utk.edu/plasma/index.html.Google ScholarGoogle Scholar
  18. The Comprehensive R Archive Network. {accessed 2012 August 22}; Available from: http://cran.r-project.org/.Google ScholarGoogle Scholar
  19. Apache Mahout Scalable machine learning and data mining {accessed 2012 August 22}; Available from: http://mahout.apache.org/.Google ScholarGoogle Scholar
  20. Shantenu Jha, Murray Cole, Daniel S. Katz, Manish Parashar, Omer Rana, and J. Weissman, Distributed Computing Practice for Large-Scale Science & Engineering Applications Concurrency and Computation: Practice and Experience (in press), 2012.Google ScholarGoogle Scholar
  21. Andre Luckow, Mark Santcroos, Ole Weidner, Andre Merzky, Pradeep Mantha, and Shantenu Jha, P*: A Model of Pilot-Abstractions, in 8th IEEE International Conference on e-Science. 2012.Google ScholarGoogle Scholar
  22. Pradeep Kumar Mantha, Andre Luckow, and S. Jha, Pilot-MapReduce: an extensible and flexible MapReduce implementation for distributed data, in Third international workshop on MapReduce and its Applications. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Apache. HDFS Overview. 2010 {accessed 2010 November 6}; Available from: http://hadoop.apache.org/hdfs/.Google ScholarGoogle Scholar
  24. Jonathan Klinginsmith, M. Mahoui, and Y. M. Wu, Towards Reproducible eScience in the Cloud., in Third International Conference on Cloud Computing Technology and Science (CloudCom). November 29 - December 1, 2011. DOI: 10.1109/CloudCom.2011.89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jonathan Klinginsmith and Judy Qiu, Using Cloud Computing for Scalable, Reproducible Experimentation. August, 2012.Google ScholarGoogle Scholar
  26. Gregor von Laszewski, Hyungro Lee, Javier Diaz, Fugang Wang, Koji Tanaka, Shubhada Karavinkoppa, Geoffrey C. Fox, and Tom Furlani, Design of an Accounting and Metric-based Cloud-shifting and Cloud-seeding framework for Federated Clouds and Bare-metal Environments, in Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit. September 21, 2012. San Jose, CA (USA). http://grids.ucs.indiana.edu/ptliupages/publications/p25-vonLaszewski.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Geoffrey C. Fox, Gregor von Laszewski, Javier Diaz, Kate Keahey, Jose Fortes, Renato Figueiredo, Shava Smallen, Warren Smith, and Andrew Grimshaw, FutureGrid - a reconfigurable testbed for Cloud, HPC and Grid Computing, Chapter in On the Road to Exascale Computing: Contemporary Architectures in High Performance Computing, Jeff Vetter, Editor. 2012, Chapman & Hall/CRC Press http://grids.ucs.indiana.edu/ptliupages/publications/sitka-chapter.pdfGoogle ScholarGoogle Scholar
  28. Javier Diaz, Gregor von Laszewski, Fugang Wang, and Geoffrey Fox, Abstract Image Management and Universal Image Registration for Cloud and HPC Infrastructures, in IEEE CLOUD 2012 5th International Conference on Cloud Computing June 24-29 2012. Hyatt Regency Waikiki Resort and Spa, Honolulu, Hawaii, USA http://grids.ucs.indiana.edu/ptliupages/publications/jdiaz-IEEECloud2012_id-4656.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Diaz, A. J. Younge, G. von Laszewski, F. Wang, and G. C. Fox, Grappling cloud infrastructure services with a generic image repository, in CCA11: Cloud Computing and Its Applications. April 12-13, 2011. Argonne National Laboratory, USA. http://grids.ucs.indiana.edu/ptliupages/publications/11-imagerepo-cca.pdf.Google ScholarGoogle Scholar
  30. Javier Diaz, Gregor von Laszewski, Fugang Wang, Andrew J. Younge, and Geoffrey Fox, FutureGrid Image Repository: A Generic Catalog and Storage System for Heterogeneous Virtual Machine Images, in 3rd IEEE International Conference CloudCom on Cloud Computing Technology and Science. November 29 - December 1, 2011. Athens Greece. http://grids.ucs.indiana.edu/ptliupages/publications/jdiazCloudCom2011.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large scale data analytics on clouds

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader