Massive-Scale Analytics

Ghoting, Amol; Gunnels, John A.; Squillante, Mark S.

doi:10.1007/978-0-387-09766-4_418

Amol Ghoting Dr.²,
John A. Gunnels Dr.³ &
Mark S. Squillante Dr.⁴

Synonyms

Deep analytics; Large-scale analytics; MapReduce

Definition

Massive-scale analytics refers to a combination of mathematical methods and high-performance computational methods for the analysis of vast amounts of data in order to gain crucial insights and facilitate decision making across a broad spectrum of application domains, sometimes in an automated data-driven fashion. This requires a unified approach that addresses the two key dimensions of massive-scale analytics: data and computation.

Discussion

Introduction

The volumes of data available to various organizations throughout society over the past many years have been growing at an explosive rate. Important advances in computer sciences and technologies have made this possible, which in turn have resulted in the development and growth of data warehouses and reporting capabilities to view information concerning various aspects of an enterprise. These basic reporting capabilities have helped organizations improve the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 2,469.00; Price excludes VAT (USA)

Hardcover Book: USD 1,799.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

Aggarwal C (2007) Data streams: models and algorithms. Springer, New York
MATH Google Scholar
Aven OI, Coffman EG Jr, Kogan YA (1987) Stochastic analysis of computer storage. D. Reidel, Amsterdam
MATH Google Scholar
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, ACM, New York, pp 1–16
Google Scholar
Beckman P, Iskra K, Yoshii K, Coghlan S (2006) Operating system issues for petascale systems. SIGOPS Oper Syst Rev 40:29–33
Article Google Scholar
Bekkermann R, Bilenko M, Langford J (2011) Scaling up machine learning. Cambridge University Press, New York
Google Scholar
Bikshandi G, Almasi G, Kodali S, Saraswat V, Sur S (2009) A comparative study and empirical evaluation of global view HPL program in X10. In: 3rd conference on partitioned global address space programming models 2009 (PGAS), ACM, New York
Google Scholar
Chapman B, Jost G, van der Pas R (2007) Using open MP: portable shared memory parallel programming (scientific and engineering computation). The MIT Press, Cambridge, MA
Google Scholar
Chung I-H, Seelam S, Mohr B, Labarta J (2009) Tools for scalable performance analysis on petascale systems. Parallel Distrib Process Symp, Int 0:1–3
Article Google Scholar
Courtois PJ (1977) Decomposability. Academic, New York
MATH Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98
Article Google Scholar
Dietterich T (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) First International Workshop on multiple classifier systems. Springer, New York, pp 1–15
Chapter Google Scholar
Garofalakis M, Gibbons P (2001) Approximate query processing: taming the terabytes. In: Proceedings of very large databases, Rome, Italy
Google Scholar
Gaver DP Jr. (1962) A waiting line with interrupted service, including priorities. J R Stat Soc, Ser B 24:73–90
MATH MathSciNet Google Scholar
Ghoting A, Krishnamurthy R, Pednault E, Reinwald B, Sindhwani V, Tatikonda S, Tian Y, Vaithyanathan S (2011) SystemML: declarative machine learning on MapReduce. In: Proceedings of the IEEE international conference on data engineering
Google Scholar
Ghoting A, Makarychev K (2009) Proceedings of the ACM/IEEE conference on high performance computing, SC 2009, 14–20 Nov 2009, Portland, OR. In: SC. ACM, New York
Google Scholar
Gropp W, Lusk E, Thakur R (1999) Using MPI-2: advanced features of the message passing interface. MIT Press, Cambridge, MA
Google Scholar
Gunnels J, Lee J, Margulies S (2010) Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization. Math Program Comput 2:103–124. doi: 10.1007/s12532-010-0014-4
Article MATH MathSciNet Google Scholar
Gupta M, Midkiff S, Schonberg E, Seshadri V, Shields D, Wang K-Y, Ching W-M, Ngo T (1995) An HPF compiler for the IBM SP2. In: Proceedings of the ACM Conference on Supercomputing, December 1995
Google Scholar
Intel Corporation Intel parallel studio – compiler, libraries and analysis tools (2011). http://software.intel.com/en-us/articles/intel-parallel-studio-home/
Jaiswal NK (1968) Priority queues. Academic, New York
MATH Google Scholar
Kelly FP (1991) Loss networks. Ann Appl probab 1(3):319–378
Article MATH MathSciNet Google Scholar
Kistler M, Gunnels J, Brokenshire D, Benton B (2008) Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci. Program 17:43–57
Google Scholar
Kumar V, Grama A, Gupta A, Karypis G (1994) Introduction to parallel computing: design and analysis of algorithms. Benjamin-Cummings, Redwood City
MATH Google Scholar
Nichols B, Buttlar D, Farrell JP (1996) Pthreads programming. O’Reilly & Associates, Sebastopol
Google Scholar
Peris VG, Squillante MS, Naik VK (1994) Analysis of the impact of memory in distributed parallel processing systems. In: Proceedings of ACM SIGMETRICS conference on measurement and modeling of computer systems, ACM, New York, pp 5–18
Google Scholar
Sarawagi S, Thomas S, Agrawal R (2000) Integrating association rule mining with relational database systems: alternatives and implications. Data Min Knowl Discov 4(2):89–125
Article Google Scholar
Squillante MS (2005) Stochastic analysis of resource allocation in parallel processing systems. In: Gelenbe E (ed) Computer system performance modeling in perspective: a tribute to the work of Prof. Sevcik KC. Imperial College Press, London, pp 227–256
Google Scholar
Squillante MS (2011) Stochastic analysis and optimization of multiserver systems. In: Ardagna D, Zhang L (eds) Run-time models for self-managing systems and applications, chapter 1 Springer, Berlin
Google Scholar
Squillante MS, Zhang Y, Sivasubramaniam A, Gautam N, Franke H, Moreira J (2002) Modeling and analysis of dynamic coscheduling in parallel and distributed environments. In: Proceedings of ACM SIGMETRICS conference on measurement and modeling of computer systems, ACM, New York, pp 43–54
Google Scholar
Stonebraker M, Abadi D, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, Trondheim, Norway, pp 553–564
Google Scholar
Su J, Yelick K (2008) Automatic communication performance debugging in PGAS languages. In: Adve V, Garzarán MJ, Petersen P (eds) Languages and compilers for parallel computing. Springer, Berlin/Heidelberg, pp 232–245
Chapter Google Scholar
TOP500 Organization Performance development – TOP500 supercomputing sites. http://www.top500.org/lists/2010/11/performance_development. Accessed on November, 2010
Zaki M, Ho C (2000) Large-scale parallel data mining. Springer, New York
Book Google Scholar

Download references

Author information

Authors and Affiliations

Mathematical Sciences Department, IBM Thomas. J. Watson Research Center, Yorktown Heights, NY, USA
Amol Ghoting Dr.
Mathematical Sciences Department, IBM Corp., 1101 Kitchawan Road, 10598, Yorktown Heights, NY, USA
John A. Gunnels Dr.
Mathematical Sciences Department, IBM, Yorktown Heights, NY, USA
Mark S. Squillante Dr.

Authors

Amol Ghoting Dr.
View author publications
You can also search for this author in PubMed Google Scholar
John A. Gunnels Dr.
View author publications
You can also search for this author in PubMed Google Scholar
Mark S. Squillante Dr.
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Illinois at Urbana-Champaign, Urbana, IL, USA
David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Ghoting, A., Gunnels, J.A., Squillante, M.S. (2011). Massive-Scale Analytics. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_418

Download citation

DOI: https://doi.org/10.1007/978-0-387-09766-4_418
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09765-7
Online ISBN: 978-0-387-09766-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics