skip to main content
column

Efficiency matters!

Published:12 March 2010Publication History
Skip Abstract Section

Abstract

Current data intensive scalable computing (DISC) systems, although scalable, achieve embarrassingly low rates of processing per node. We feel that current DISC systems have repeated a mistake of old high-performance systems: focusing on scalability without considering efficiency. This poor efficiency comes with issues in reliability, energy, and cost. As the gap between theoretical performance and what is actually achieved has become glaringly large, we feel there is a pressing need to rethink the design of future data intensive computing and carefully consider the direction of future research.

References

  1. Bowen Alpern and Larry Carter. The myth of scalable high performance. In PPSC 1995: SIAM Conference on Parallel Processing for Scientific Computing, pages 857--859, February 1995. Available at http: //www.cs.ucsd.edu/users/carter/Papers/scale.ps Accessed June 2009.Google ScholarGoogle Scholar
  2. Eric Anderson, Martin Arlitt, Charles B. Morrey III, and Alistair Veitch. DataSeries: An Efficient, Flexible Data Format for Structured Serial Data. Operating Systems Review, 43(1):70--75, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anon, et. al. A measure of transaction processing power. Datamation, 31(7):112--118, 1985. Available at http://sortbenchmark.org/AMeasureOfTransactionProcessingPower.doc accessed June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David H. Bailey. Highly parallel perspective: Twelve ways to fool the masses when giving performance results on parallel computers. Supercomputing Review, 4(8):54--55, Aug 1991.Google ScholarGoogle Scholar
  5. Luiz André Barroso. Saving the planet with systems research. Keynote abstract at http://www.cs.virginia.edu/asplos09/keynote.htm Accessed June 2009.Google ScholarGoogle Scholar
  6. Luiz André Barroso and Urs Hölzle. The case for energy-proportional computing. IEEE Computer, 40(12):33--37, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Randal E. Bryant. Data-intensive supercomputing: The case for DISC. Technical Report CMU-CS-07-128, Carnegie Mellon University, 2007.Google ScholarGoogle Scholar
  8. Grzegorz Czajkowski. Sorting 1pb with MapReduce. Available at http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html Accessed June 2009.Google ScholarGoogle Scholar
  9. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI '04: Proceedings of the Sixth Conference on Operating Systems Design and Implementation. USENIX Association, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power provisioning for a warehouse-sized computer. In ISCA '07: Proceedings of the 34th annual International Symposium on Computer Architecture, pages 13--23, New York, NY, USA, 2007. ACM. Available at http://research.google.com/archive/power_provisioning.pdf Accessed June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In SOSP '03: Proceedings of the 19th ACM Symposium on Operating Systems Principles", pages 29--43, October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. James Hamilton. Service design best practices. Available at http://www.mvdirona.com/jrh/TalksAndPapers/JamesHamilton_POA20090226.pdf Accessed September 2009.Google ScholarGoogle Scholar
  13. Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, and Lidong Zhou. Wave Computing in the Cloud. HotOS 2009, 2009. Available at http://www.usenix.org/events/hotos/tech/full_papers/he/he.pdf Accessed June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Joseph M. Hellerstein. Diverging views on Big Data density, and some gimmes. Available at http://databeta.wordpress.com/2009/05/14/bigdata-node-density/ Accessed June 2009.Google ScholarGoogle Scholar
  15. Energy-efficient software guidelines. Available at http://software.intel.com/en-us/articles/energy-efficient-software-guidelines/ Accessed June 2009.Google ScholarGoogle Scholar
  16. Guillaume Marceau. The speed, size and dependability of programming languages. Available at http://gmarceau.qc.ca/blog/2009/05/speed-size-and-dependability-of.html Accessed June 2009.Google ScholarGoogle Scholar
  17. Chris Nyberg and Mehul Shah. Sort benchmark home page. Available at http://sortbenchmark.org Accessed June 2009.Google ScholarGoogle Scholar
  18. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, and Michael Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, June 2009. Available at http://db.csail.mit.edu/pubs/benchmarks-sigmod09.pdf Accessed June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming, 13(4):277--298, October 2005. Available at http://labs.google.com/papers/sawzall.html accessed June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lutz Prechelt. An empirical comparison of seven programming languages. IEEE Computer, 33(10):23--29, 2000. Available at http://page.mi.fu-berlin.de/~prechelt/Biblio/jccpprt_computer2000.pdf accessed June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Benchmarking - thrift-protobuf-compare. Available at http://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking Accessed June 2009.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 44, Issue 1
    January 2010
    115 pages
    ISSN:0163-5980
    DOI:10.1145/1740390
    Issue’s Table of Contents

    Copyright © 2010 Copyright is held by the owner/author(s)

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 March 2010

    Check for updates

    Qualifiers

    • column

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader