Synonyms
Definition
Massive-scale analytics refers to a combination of mathematical methods and high-performance computational methods for the analysis of vast amounts of data in order to gain crucial insights and facilitate decision making across a broad spectrum of application domains, sometimes in an automated data-driven fashion. This requires a unified approach that addresses the two key dimensions of massive-scale analytics: data and computation.
Discussion
Introduction
The volumes of data available to various organizations throughout society over the past many years have been growing at an explosive rate. Important advances in computer sciences and technologies have made this possible, which in turn have resulted in the development and growth of data warehouses and reporting capabilities to view information concerning various aspects of an enterprise. These basic reporting capabilities have helped organizations improve the...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Bibliography
Aggarwal C (2007) Data streams: models and algorithms. Springer, New York
Aven OI, Coffman EG Jr, Kogan YA (1987) Stochastic analysis of computer storage. D. Reidel, Amsterdam
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, ACM, New York, pp 1–16
Beckman P, Iskra K, Yoshii K, Coghlan S (2006) Operating system issues for petascale systems. SIGOPS Oper Syst Rev 40:29–33
Bekkermann R, Bilenko M, Langford J (2011) Scaling up machine learning. Cambridge University Press, New York
Bikshandi G, Almasi G, Kodali S, Saraswat V, Sur S (2009) A comparative study and empirical evaluation of global view HPL program in X10. In: 3rd conference on partitioned global address space programming models 2009 (PGAS), ACM, New York
Chapman B, Jost G, van der Pas R (2007) Using open MP: portable shared memory parallel programming (scientific and engineering computation). The MIT Press, Cambridge, MA
Chung I-H, Seelam S, Mohr B, Labarta J (2009) Tools for scalable performance analysis on petascale systems. Parallel Distrib Process Symp, Int 0:1–3
Courtois PJ (1977) Decomposability. Academic, New York
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98
Dietterich T (2000) Ensemble methods in machine learning. In: Kittler J, Roli F (eds) First International Workshop on multiple classifier systems. Springer, New York, pp 1–15
Garofalakis M, Gibbons P (2001) Approximate query processing: taming the terabytes. In: Proceedings of very large databases, Rome, Italy
Gaver DP Jr. (1962) A waiting line with interrupted service, including priorities. J R Stat Soc, Ser B 24:73–90
Ghoting A, Krishnamurthy R, Pednault E, Reinwald B, Sindhwani V, Tatikonda S, Tian Y, Vaithyanathan S (2011) SystemML: declarative machine learning on MapReduce. In: Proceedings of the IEEE international conference on data engineering
Ghoting A, Makarychev K (2009) Proceedings of the ACM/IEEE conference on high performance computing, SC 2009, 14–20 Nov 2009, Portland, OR. In: SC. ACM, New York
Gropp W, Lusk E, Thakur R (1999) Using MPI-2: advanced features of the message passing interface. MIT Press, Cambridge, MA
Gunnels J, Lee J, Margulies S (2010) Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization. Math Program Comput 2:103–124. doi: 10.1007/s12532-010-0014-4
Gupta M, Midkiff S, Schonberg E, Seshadri V, Shields D, Wang K-Y, Ching W-M, Ngo T (1995) An HPF compiler for the IBM SP2. In: Proceedings of the ACM Conference on Supercomputing, December 1995
Intel Corporation Intel parallel studio – compiler, libraries and analysis tools (2011). http://software.intel.com/en-us/articles/intel-parallel-studio-home/
Jaiswal NK (1968) Priority queues. Academic, New York
Kelly FP (1991) Loss networks. Ann Appl probab 1(3):319–378
Kistler M, Gunnels J, Brokenshire D, Benton B (2008) Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci. Program 17:43–57
Kumar V, Grama A, Gupta A, Karypis G (1994) Introduction to parallel computing: design and analysis of algorithms. Benjamin-Cummings, Redwood City
Nichols B, Buttlar D, Farrell JP (1996) Pthreads programming. O’Reilly & Associates, Sebastopol
Peris VG, Squillante MS, Naik VK (1994) Analysis of the impact of memory in distributed parallel processing systems. In: Proceedings of ACM SIGMETRICS conference on measurement and modeling of computer systems, ACM, New York, pp 5–18
Sarawagi S, Thomas S, Agrawal R (2000) Integrating association rule mining with relational database systems: alternatives and implications. Data Min Knowl Discov 4(2):89–125
Squillante MS (2005) Stochastic analysis of resource allocation in parallel processing systems. In: Gelenbe E (ed) Computer system performance modeling in perspective: a tribute to the work of Prof. Sevcik KC. Imperial College Press, London, pp 227–256
Squillante MS (2011) Stochastic analysis and optimization of multiserver systems. In: Ardagna D, Zhang L (eds) Run-time models for self-managing systems and applications, chapter 1 Springer, Berlin
Squillante MS, Zhang Y, Sivasubramaniam A, Gautam N, Franke H, Moreira J (2002) Modeling and analysis of dynamic coscheduling in parallel and distributed environments. In: Proceedings of ACM SIGMETRICS conference on measurement and modeling of computer systems, ACM, New York, pp 43–54
Stonebraker M, Abadi D, Batkin A, Chen X, Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, Trondheim, Norway, pp 553–564
Su J, Yelick K (2008) Automatic communication performance debugging in PGAS languages. In: Adve V, Garzarán MJ, Petersen P (eds) Languages and compilers for parallel computing. Springer, Berlin/Heidelberg, pp 232–245
TOP500 Organization Performance development – TOP500 supercomputing sites. http://www.top500.org/lists/2010/11/performance_development. Accessed on November, 2010
Zaki M, Ho C (2000) Large-scale parallel data mining. Springer, New York
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Ghoting, A., Gunnels, J.A., Squillante, M.S. (2011). Massive-Scale Analytics. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_418
Download citation
DOI: https://doi.org/10.1007/978-0-387-09766-4_418
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09765-7
Online ISBN: 978-0-387-09766-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering