skip to main content
10.1145/1352135.1352177acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article

Cluster computing for web-scale data processing

Published:12 March 2008Publication History

ABSTRACT

In this paper we present the design of a modern course in cluster computing and large-scale data processing. The defining differences between this and previously published designs are its focus on processing very large data sets and its use of Hadoop, an open source Java-based implementation of MapReduce and the Google File System as the platform for programming exercises. Hadoop proved to be a key element for successfully implementing structured lab activities and independent design projects. Through this course, offered at the University of Washington in 2007, we imparted new skills on our students, improving their ability to design systems capable of solving web-scale problems.

References

  1. ACM/IEEE-CS Joint Curriculum Task Force. Computing Curricula 2001. IEEE Computer Society and Association for Computing Machinery., 2001.Google ScholarGoogle Scholar
  2. P. Anderson, C. Christensen, and B. Allen. Designing a runtime system for volunteer computing. Proceedings of the 2006 IEEE/ACM SC06 Conference, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cafarella and D. Cutting. Building Nutch: Open source search. ACM Queue, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cunha and J. Lourenço. An integrated course on parallel and distributed processing. In SIGCSE '98: Proceedings of the Twenty-Ninth SIGCSE Technical Symposium on Computer Science Education, pages 217--221, New York, NY, USA, 1998. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI'04: Sixth Symposium on Operating System Design and Implementation, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. SIGOPS Operating Systems Review, 37(5):29--43, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hadoop. http://lucene.apache.org/hadoop/.Google ScholarGoogle Scholar
  9. Kimball and S. Michels-Slettvet. CSE 490H lecture notes: Problem solving on large scale clusters.Google ScholarGoogle Scholar
  10. http://code.google.com/edu/content/submissions/Google ScholarGoogle Scholar
  11. uwspr2007_clustercourse/listing.html, 2007.Google ScholarGoogle Scholar
  12. Pheatt. An easy to use distributed computing framework. SIGCSE '07: Proceedings of the Thirty-Eighth SIGCSE Technical Symposium on Computer Science Education, pages 571--575, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sahami. Scaling computer science education to education on scaling in computer science. Workshop on Integrative Computing Education & Research (ICER): Preparing IT Graduates for 2010 and Beyond, Jan. 2006.Google ScholarGoogle Scholar
  14. Satyanarayanan, J. Howard, D. Nichols, R. Sidebotham, A. Spector, and M. West. The ITC distributed file system: Principles and design. In Proceedings of the 10th ACM Symposium on Operating System Principles, pages 35--50, New York, NY, USA, Dec. 1985. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cluster computing for web-scale data processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGCSE '08: Proceedings of the 39th SIGCSE technical symposium on Computer science education
        March 2008
        606 pages
        ISBN:9781595937995
        DOI:10.1145/1352135

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 March 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,595of4,542submissions,35%

        Upcoming Conference

        SIGCSE Virtual 2024

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader