skip to main content
research-article
Free access

Hadoop superlinear scalability

Published: 23 March 2015 Publication History

Abstract

The perpetual motion of parallel performance.

References

[1]
Apache Whirr; https://whirr.apache.org.
[2]
Calvert, C. and Kulkarni D. Essential LINQ. Pearson Education, Boston, MA, 2009.
[3]
Cloudera Hadoop; http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html/.
[4]
Eijkhout, V. Introduction to High Performance Scientific Computing. Lulu.com, 2014.
[5]
Feynman, R.P. Papp perpetual motion engine; http://hoaxes.org/comments/papparticle2.html.
[6]
Gunther, N.J. A simple capacity model of massively parallel transaction systems. In Proceedings of International Computer Measurement Group Conference, (1993).
[7]
Gunther, N.J. A general theory of computational scalability based on rational functions, 2008; http://arxiv.org/abs/0808.1431.
[8]
Gunther, N.J. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer, New York, NY, 2007.
[9]
Gunther, N.J. Performance and scalability models for a hypergrowth e-commerce Web site. Performance Engineering. R.R. Dumke, C. Rautenstrauch, A. Schmietendorf, and A. Scholz, eds. A. Lecture Notes in Computer Science 2047 (2001). Springer-Verlag 267--282.
[10]
Gunther, N.J. PostgreSQL scalability analysis deconstructed. The Pith of Performance, 2012; http://perfdynamics.blogspot.com/2012/04/postgresqlscalability-analysis.html.
[11]
Gunther, N.J., Subramanyam, S. and Parvu, S. Hidden scalability gotchas in memcached and friends. VELOCITY Web Performance and Operations Conference, (2010).
[12]
Haas, R. Scalability, in graphical form, analyzed, 2011; http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html.
[13]
Hadoop Log Tools; https://github.com/melrief/Hadoop-Log-Tools.
[14]
Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach. Second edition. Morgan Kaufmann, Waltham, MA, 1996.
[15]
O'Malley, O. TeraByte sort on Apache Hadoop, 2008; http://sortbenchmark.org/YahooHadoop.pdf.
[16]
O'Malley, O., Murthy, A. C. 2009. Winning a 60-second dash with a yellow elephant; http://sortbenchmark.org/Yahoo2009.pdf.
[17]
Performance Dynamics Company. How to quantify scalability, 2014; http://www.perfdynamics.com/Manifesto/USLscalability.html.
[18]
Schwartz, B. Is VoltDB really as scalable as they claim? Percona MySQL Performance Blog; http://www.percona.com/blog/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/.
[19]
sFlow. SDN analytics and control using sFlow standard --- Superlinear; http://blog.sflow.com/2010/09/superlinear.html.
[20]
Stackoverflow. Where does superlinear speedup come from?; http://stackoverflow.com/questions/4332967/where-does-super-linear-speedup-come-from
[21]
Sun Fire X2270 M2 superlinear scaling of Hadoop TeraSort and CloudBurst benchmarks, 2010; https://blogs.oracle.com/BestPerf/entry/20090920_x2270m2_hadoop.
[22]
Sutter, H. Going superlinear. Dr. Dobb's J. 33, 3 (2008); http://www.drdobbs.com/cpp/going-superlinear/206100542.
[23]
Sutter, H. Super linearity and the bigger machine. Dr. Dobb's J. 33, 4 (2008); http://www.drdobbs.com/parallel/super-linearity-and-the-biggermachine/206903306.
[24]
White, T. Hadoop: The Definitive Guide, third edition. O'Reilly Media, 2012.
[25]
Yahoo! Hadoop Tutorial; https://developer.yahoo.com/hadoop/tutorial/module1.html#scalability.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 58, Issue 4
April 2015
86 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2749359
  • Editor:
  • Moshe Y. Vardi
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 March 2015
Published in CACM Volume 58, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)388
  • Downloads (Last 6 weeks)92
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Scalability in Computing and RoboticsIEEE Transactions on Computers10.1109/TC.2021.308904471:6(1453-1465)Online publication date: 1-Jun-2022
  • (2022)A configurable method for benchmarking scalability of cloud-native applicationsEmpirical Software Engineering10.1007/s10664-022-10162-127:6Online publication date: 1-Nov-2022
  • (2022)Abstract cost models for distributed data-intensive computationsDistributed and Parallel Databases10.1007/s10619-018-7244-237:3(411-439)Online publication date: 10-Mar-2022
  • (2021)How to Measure Scalability of Distributed Stream Processing Engines?Companion of the ACM/SPEC International Conference on Performance Engineering10.1145/3447545.3451190(85-88)Online publication date: 19-Apr-2021
  • (2021)An Analysis of Software Parallelism in Big Data Technologies for Data-Intensive ArchitecturesSoftware Architecture10.1007/978-3-030-86044-8_13(181-188)Online publication date: 13-Sep-2021
  • (2020)Analyzing Big Data Originated from Data Communication Networks using K-Means Algorithm to Understand the Nature of Incoming Malicious Connections2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)10.1109/MPCIT51588.2020.9350510(129-132)Online publication date: 11-Dec-2020
  • (2020)Distributed Computing for Smart Meter Data Management for Electrical Utility Applications2020 Cybernetics & Informatics (K&I)10.1109/KI48306.2020.9039899(1-6)Online publication date: Jan-2020
  • (2020)Continuous outlier mining of streaming data in flinkInformation Systems10.1016/j.is.2020.10156993(101569)Online publication date: Nov-2020
  • (2018)A general framework for real-time analysis of massive multimedia streamsMultimedia Systems10.1007/s00530-017-0566-524:4(391-406)Online publication date: 1-Jul-2018
  • (2017)Dynamic Configuration of Partitioning in Spark ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.264793928:7(1891-1904)Online publication date: 10-Jun-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media