Skip to main content

Massive-Scale Analytics

  • Reference work entry
Encyclopedia of Parallel Computing

Synonyms

Deep analytics; Large-scale analytics; MapReduce

Definition

Massive-scale analytics refers to a combination of mathematical methods and high-performance computational methods for the analysis of vast amounts of data in order to gain crucial insights and facilitate decision making across a broad spectrum of application domains, sometimes in an automated data-driven fashion. This requires a unified approach that addresses the two key dimensions of massive-scale analytics: data and computation.

Discussion

Introduction

The volumes of data available to various organizations throughout society over the past many years have been growing at an explosive rate. Important advances in computer sciences and technologies have made this possible, which in turn have resulted in the development and growth of data warehouses and reporting capabilities to view information concerning various aspects of an enterprise. These basic reporting capabilities have helped organizations improve the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 1,600.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,799.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

  1. Aggarwal C (2007) Data streams: models and algorithms. Springer, New York

    MATH  Google Scholar 

  2. Aven OI, Coffman EG Jr, Kogan YA (1987) Stochastic analysis of computer storage. D. Reidel, Amsterdam

    MATH  Google Scholar 

  3. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, ACM, New York, pp 1–16

    Google Scholar 

  4. Beckman P, Iskra K, Yoshii K, Coghlan S (2006) Operating system issues for petascale systems. SIGOPS Oper Syst Rev 40:29–33

    Article  Google Scholar 

  5. Bekkermann R, Bilenko M, Langford J (2011) Scaling up machine learning. Cambridge University Press, New York

    Google Scholar 

  6. Bikshandi G, Almasi G, Kodali S, Saraswat V, Sur S (2009) A comparative study and empirical evaluation of global view HPL program in X10. In: 3rd conference on partitioned global address space programming models 2009 (PGAS), ACM, New York

    Google Scholar 

  7. Chapman B, Jost G, van der Pas R (2007) Using open MP: portable shared memory parallel programming (scientific and engineering computation). The MIT Press, Cambridge, MA

    Google Scholar 

  8. Chung I-H, Seelam S, Mohr B, Labarta J (2009) Tools for scalable performance analysis on petascale systems. Parallel Distrib Process Symp, Int 0:1–3

    Article  Google Scholar 

  9. Courtois PJ (1977) Decomposability. Academic, New York

    MATH  Google Scholar 

  10. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  11. DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98

    Article  Google Scholar 

  12. Dietterich T (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) First International Workshop on multiple classifier systems. Springer, New York, pp 1–15

    Chapter  Google Scholar 

  13. Garofalakis M, Gibbons P (2001) Approximate query processing: taming the terabytes. In: Proceedings of very large databases, Rome, Italy

    Google Scholar 

  14. Gaver DP Jr. (1962) A waiting line with interrupted service, including priorities. J R Stat Soc, Ser B 24:73–90

    MATH  MathSciNet  Google Scholar 

  15. Ghoting A, Krishnamurthy R, Pednault E, Reinwald B, Sindhwani V, Tatikonda S, Tian Y, Vaithyanathan S (2011) SystemML: declarative machine learning on MapReduce. In: Proceedings of the IEEE international conference on data engineering

    Google Scholar 

  16. Ghoting A, Makarychev K (2009) Proceedings of the ACM/IEEE conference on high performance computing, SC 2009, 14–20 Nov 2009, Portland, OR. In: SC. ACM, New York

    Google Scholar 

  17. Gropp W, Lusk E, Thakur R (1999) Using MPI-2: advanced features of the message passing interface. MIT Press, Cambridge, MA

    Google Scholar 

  18. Gunnels J, Lee J, Margulies S (2010) Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization. Math Program Comput 2:103–124. doi: 10.1007/s12532-010-0014-4

    Article  MATH  MathSciNet  Google Scholar 

  19. Gupta M, Midkiff S, Schonberg E, Seshadri V, Shields D, Wang K-Y, Ching W-M, Ngo T (1995) An HPF compiler for the IBM SP2. In: Proceedings of the ACM Conference on Supercomputing, December 1995

    Google Scholar 

  20. Intel Corporation Intel parallel studio – compiler, libraries and analysis tools (2011). http://software.intel.com/en-us/articles/intel-parallel-studio-home/

  21. Jaiswal NK (1968) Priority queues. Academic, New York

    MATH  Google Scholar 

  22. Kelly FP (1991) Loss networks. Ann Appl probab 1(3):319–378

    Article  MATH  MathSciNet  Google Scholar 

  23. Kistler M, Gunnels J, Brokenshire D, Benton B (2008) Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci. Program 17:43–57

    Google Scholar 

  24. Kumar V, Grama A, Gupta A, Karypis G (1994) Introduction to parallel computing: design and analysis of algorithms. Benjamin-Cummings, Redwood City

    MATH  Google Scholar 

  25. Nichols B, Buttlar D, Farrell JP (1996) Pthreads programming. O’Reilly & Associates, Sebastopol

    Google Scholar 

  26. Peris VG, Squillante MS, Naik VK (1994) Analysis of the impact of memory in distributed parallel processing systems. In: Proceedings of ACM SIGMETRICS conference on measurement and modeling of computer systems, ACM, New York, pp 5–18

    Google Scholar 

  27. Sarawagi S, Thomas S, Agrawal R (2000) Integrating association rule mining with relational database systems: alternatives and implications. Data Min Knowl Discov 4(2):89–125

    Article  Google Scholar 

  28. Squillante MS (2005) Stochastic analysis of resource allocation in parallel processing systems. In: Gelenbe E (ed) Computer system performance modeling in perspective: a tribute to the work of Prof. Sevcik KC. Imperial College Press, London, pp 227–256

    Google Scholar 

  29. Squillante MS (2011) Stochastic analysis and optimization of multiserver systems. In: Ardagna D, Zhang L (eds) Run-time models for self-managing systems and applications, chapter 1 Springer, Berlin

    Google Scholar 

  30. Squillante MS, Zhang Y, Sivasubramaniam A, Gautam N, Franke H, Moreira J (2002) Modeling and analysis of dynamic coscheduling in parallel and distributed environments. In: Proceedings of ACM SIGMETRICS conference on measurement and modeling of computer systems, ACM, New York, pp 43–54

    Google Scholar 

  31. Stonebraker M, Abadi D, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, Trondheim, Norway, pp 553–564

    Google Scholar 

  32. Su J, Yelick K (2008) Automatic communication performance debugging in PGAS languages. In: Adve V, Garzarán MJ, Petersen P (eds) Languages and compilers for parallel computing. Springer, Berlin/Heidelberg, pp 232–245

    Chapter  Google Scholar 

  33. TOP500 Organization Performance development – TOP500 supercomputing sites. http://www.top500.org/lists/2010/11/performance_development. Accessed on November, 2010

  34. Zaki M, Ho C (2000) Large-scale parallel data mining. Springer, New York

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Ghoting, A., Gunnels, J.A., Squillante, M.S. (2011). Massive-Scale Analytics. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_418

Download citation

Publish with us

Policies and ethics