skip to main content
10.1145/2245276.2245367acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

Double dip map-reduce for processing cross validation jobs

Published: 26 March 2012 Publication History

Abstract

Cross validation is fundamental to machine learning as it provides a reliable way in which to evaluate algorithms and the overall quality of the corpora in use. In typical cross validation, the corpus is initially divided into learning and training segments, then crossed-over in successive rounds, so that each data segment is validated against the remaining ones. This process is prohibitively time and effort consuming, and often brushed off for computationally cheaper ones, such as heuristics. In this paper we introduce a cloud-based architecture for running cross validation jobs. Our solution makes heavy use of computational resources in the cloud by proposing a strategy in which there are two distinct, subsequent, map-reduce cycles: the first to perform the algorithmic target computation, and the second to provide cross validation data to retrofit the machine learning process. We demonstrate the feasibility of the proposed approach, with the implementation of a web segmentation algorithm.

References

[1]
Dean, Jeffrey and Ghemawat, Sanjay; MapReduce: Simplified Data Processing on Large Clusters, Google Inc., OSDI 2004
[2]
Refaeilzadeh, Payam; Tang, Lei; Li, Huan; Encyclopedia of Database Systems, pp. 532--538 - Arizona State University, 2009
[3]
Laber, Eduardo S., Souza, Críston P. de, Jabour, Iam V., Amorim, Evelin C. F de, Cardoso, Eduardo T., Rentería, Raúl P., Tinoco, Lúcio C., Valentim, Caio D.: A fast and simple method for extracting relevant content from news webpages. CIKM 2009: 1685--1688
[4]
Quinlan, J. Ross: C4.5 Programs for Machine Learning, Morgan Kaufmann Publishers, 1993
[5]
Amazon Simple Storage Service (S3) - http://aws.amazon.com/s3/- Amazon
[6]
Ashkenas, Jeremy; CloudCrowd - https://github.com/documentcloud/cloud-crowd/wiki - The New York Times & DocumentCloud

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
  • Conference Chairs:
  • Sascha Ossowski,
  • Paola Lecca

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aws
  2. cloud computing
  3. cross validation
  4. k-fold
  5. machine learning
  6. map-reduce

Qualifiers

  • Poster

Funding Sources

  • Brazilian Institute for Web Science Research

Conference

SAC 2012
Sponsor:
SAC 2012: ACM Symposium on Applied Computing
March 26 - 30, 2012
Trento, Italy

Acceptance Rates

SAC '12 Paper Acceptance Rate 270 of 1,056 submissions, 26%;
Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 127
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media