skip to main content
10.1145/2157136.2157310acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article

Experiences teaching MapReduce in the cloud

Published: 29 February 2012 Publication History

Abstract

We describe our experiences teaching MapReduce in a large undergraduate lecture course using public cloud services. Using the cloud, every student could carry out scalability benchmarking assignments on realistic hardware, which would have been impossible otherwise. Over two semesters, over 500 students took our course. We believe this is the first large-scale demonstration that it is feasible to use pay-as-you-go billing in the Cloud for a large undergraduate course. Modest instructor effort was sufficient to prevent students from overspending. Average per-pupil expenses in the Cloud were under $45, less than half our available grant funding. Students were excited by the assignment: 90% said they thought it should be retained in future course offerings.

References

[1]
M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, et al. Above the Clouds: A Berkeley View of Cloud Computing. Technical Report 2009--28, UC Berkeley, 2009.
[2]
R. A. Brown. Hadoop at home: large-scale computing at a small college. In SIGCSE, 2009.
[3]
Cloudera, inc. Configuring and Running CDH Cloud Scripts. Retrieved August 31, 2011 from https://ccp.cloudera.com/display/CDH2DOC/Configuring+and+Running+CDH+Cloud+Scripts, 2011.
[4]
A. Couch. Comp150 CPA. Retrieved August 21, 2011 from http://www.cs.tufts.edu/comp/150CPA/, 2011.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, Volume 51(Issue 1):107--113, 2008.
[6]
P. Garrity, T. Yates, R. Brown, and E. Shoop. WebMapReduce: an accessible and adaptable tool for teaching map-reduce computing. In SIGCSE, 2011.
[7]
J. Hirai, S. Raghavan, H. Garcia-Molina, and H. Paepcke. WebBase: A repository of web pages. In WWW, May 2000.
[8]
M. Johnson, R. H. Liao, A. Rasmussen, R. Sridharan, D. D. Garcia, and B. Harvey. Infusing Parallelism into Introductory Computer Science Curriculum using MapReduce. Technical Report EECS-2008--34, UC Berkeley, 2008.
[9]
A. Kimball, S. Michels-Slettvet, and C. Bisciglia. Cluster computing for web-scale data processing. In SIGCSE, 2008.
[10]
J. Lin. Data-Intensive Information Processing Applications. Retrieved August 21, 2011 from http://www.umiacs.umd.edu/ jimmylin/cloud-2010-Spring/info.html, 2011.
[11]
D. J. Malan. Moving cs50 into the cloud. J. Comput. Small Coll., 25:111--120, June 2010.
[12]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998.
[13]
C. Shaoul and C. Westbury. A usenet corpus. Retrieved August 21, 2011 from http://www.psych.ualberta.ca/ westburylab/downloads/usenetcorpus.download.html, May 2011.

Cited By

View all
  • (2025)Hands-on parallel & distributed computing with Raspberry Pi devices and clustersJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104996196:COnline publication date: 1-Feb-2025
  • (2024)Learning Big Data Systems via EmulationProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630888(1449-1455)Online publication date: 7-Mar-2024
  • (2023)Towards a Validated Self-Efficacy Scale for Data ManagementProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 110.1145/3545945.3569767(186-192)Online publication date: 2-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCSE '12: Proceedings of the 43rd ACM technical symposium on Computer Science Education
February 2012
734 pages
ISBN:9781450310987
DOI:10.1145/2157136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 February 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. education
  3. mapreduce

Qualifiers

  • Research-article

Conference

SIGCSE '12
Sponsor:
SIGCSE '12: The 43rd ACM Technical Symposium on Computer Science Education
February 29 - March 3, 2012
North Carolina, Raleigh, USA

Acceptance Rates

SIGCSE '12 Paper Acceptance Rate 100 of 289 submissions, 35%;
Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

Upcoming Conference

SIGCSE TS 2025
The 56th ACM Technical Symposium on Computer Science Education
February 26 - March 1, 2025
Pittsburgh , PA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Hands-on parallel & distributed computing with Raspberry Pi devices and clustersJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104996196:COnline publication date: 1-Feb-2025
  • (2024)Learning Big Data Systems via EmulationProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630888(1449-1455)Online publication date: 7-Mar-2024
  • (2023)Towards a Validated Self-Efficacy Scale for Data ManagementProceedings of the 54th ACM Technical Symposium on Computer Science Education V. 110.1145/3545945.3569767(186-192)Online publication date: 2-Mar-2023
  • (2022)Integrating cloud computing into computer science curriculumJournal of Computing Sciences in Colleges10.5555/3512489.351250237:3(120-131)Online publication date: 19-Jan-2022
  • (2021)SQL2XProceedings of the 52nd ACM Technical Symposium on Computer Science Education10.1145/3408877.3432541(590-596)Online publication date: 3-Mar-2021
  • (2021)Integrating big data and cloud computing topics into the computing curriculaJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.07.012157:C(303-315)Online publication date: 1-Nov-2021
  • (2019)A Module-based Approach to Teaching Big data and Cloud Computing Topics at CS Undergraduate LevelProceedings of the 50th ACM Technical Symposium on Computer Science Education10.1145/3287324.3287494(2-8)Online publication date: 22-Feb-2019
  • (2019)WatDFSProceedings of the 50th ACM Technical Symposium on Computer Science Education10.1145/3287324.3287473(920-926)Online publication date: 22-Feb-2019
  • (2018)Teaching Big Data and Cloud Computing: A Modular Approach2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00070(377-383)Online publication date: May-2018
  • (2018)Teaching Parallel Programming with Active Learning2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00069(369-376)Online publication date: May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media