Tagged MapReduce: Efficiently Computing Multi-analytics Using MapReduce

Williams, Andreas; Mitsoulis-Ntompos, Pavlos; Chatziantoniou, Damianos

doi:10.1007/978-3-642-23544-3_18

Andreas Williams¹⁸,
Pavlos Mitsoulis-Ntompos¹⁸ &
Damianos Chatziantoniou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6862))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1369 Accesses

Abstract

MapReduce is a programming paradigm for effective processing of large datasets in distributed environments, using the map and reduce functions. The map process creates (key, value) pairs, while the reduce phase aggregates same-key values. In other words, a MapReduce application defines and reduces one set of values for each key, which means that the user only knows one aspect of the key. Advanced OLAP applications however, require multiple sets to be defined and reduced for the same key, not necessarily mutually disjoint. The challenge is to extend MapReduce to support this in a syntactically simple and computationally efficient way. We propose an extension to the classic MapReduce model, called Tagged MapReduce, where data is represented as (key, value, tag) triplets. Users map triplets and reducing takes place for each key and for each tag. For example, given a set of pages, one may want to count words’ occurrences per page type. The page type is represented by the tag. While the classic MapReduce can handle this class of queries, it requires effort and possibly advanced programming skills for efficient implementations. For example, should the tag form a compound object with the key or the value? Our formalism makes it simpler for the programmer to use and easier for the system to identify and apply efficient algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

MapReduce – The Scalable Distributed Data Processing Solution

MapReduce: A Big Data-Maintained Algorithm Empowering Big Data Processing for Enhanced Business Insights

Big Data Processing Algorithms

References

Friedman, E., Pawlowski, P., Cieslewicz, J.: SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. In: VLDB (2009)
Google Scholar
Hacker, S., Simmons, R., Varming, C.: Netezza meets MapReduce Abstractions for Data Intensive Computing
Google Scholar
Oracle Corporation: Integrating Hadoop Data with Oracle Parallel Processing. An Oracle white paper (2010)
Google Scholar
Xu, Y., Kostamaa, P., Gao, L.: Integrating Hadoop and Parallel DBMS. In: SIGMOD (2010)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A Not-So-Foreign Language for Data Processing. In: SIGMOD (2008)
Google Scholar
DeWitt, D., Stonebraker, M.: MapReduce: A major step backwards. DatabaseColumnBlog, http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html
Pavlo, A., Paulson, E., Alexander, R., Abadi, J.D., DeWitt, J.D., Madden, S., Stonebraker, M.: A comparison to Approaches to Large-Scale Data Analysis. In: SIGMOD (2009)
Google Scholar
Abouzeid, A., Pawlikowski-Bajda, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: An Architecture Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: VLDB (2009)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: A Flexible Data Processing Tool. Communications of the ACM 53(1), 72–77 (2010)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI (2004)
Google Scholar
Apache Hadoop, http://hadoop.apache.org
Isard, M., Budiu, M., Yu, Y., Birell, A., Fetterly, D.: Dryad: Distributed data-parallel programs for sequential building blocks. In: Proceedings of EuroSys (2007)
Google Scholar
H.-c. Yang, A., Dasdan, R.-L., Hsiao, D.S.: Parker: Map-Reduce-Merge: Simplified Realtional Data Processing on Large Clusters. In: SIGMOD (2007)
Google Scholar
Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: MRShare: Sharing Across Multiple Queries in MapReduce. In: VLDB (2010)
Google Scholar
Chatziantoniou, D., Tzortzakakis, E.: ASSET Queries: A Declarative Alternative to MapReduce. ACM SIGMOD Record 38(2) (2009)
Google Scholar
Mackey, G., Sehrish, S., Bent, J., Lopez, J., Habib, S., Wang, J.: Intoducing MapReduce to High End Computing. In: PDSW (2008)
Google Scholar
Chatziantoniou, D., Ross, K.: Querying Multiple Features of Groups in Relational Databases. In: VLDB (1996)
Google Scholar
Chatziantoniou, D.: Evaluation of Ad Hoc OLAP: In-Place Computation. In: SSDM (1999)
Google Scholar
Chatziantoniou, D.: The PanQ Tool and EMF SQL for Complex Data Management. In: KDD, pp. 420–424 (1999)
Google Scholar
Chatziantoniou, D.: Using grouping variables to express complex decision support queries. DKE Journal 61(1), 114–136 (2007)
Article Google Scholar
Chatziantoniou, D., Akinde, M.O., Johnson, T., Kim, S.: The MD-join: An Operator for Complex OLAP. In: ICDE, pp. 524–533 (2001)
Google Scholar
Oracle: Analytic Functions for Oracle 8i. White Paper, Oracle Corporation (1999)
Google Scholar
Amazon EC2 cluster, http://aws.amazon.com/ec2/

Download references

Author information

Authors and Affiliations

Department of Management Science and Technology, Athens University of Economics and Business (AUEB), Patission Ave, 104 34, Athens, Greece
Andreas Williams, Pavlos Mitsoulis-Ntompos & Damianos Chatziantoniou

Authors

Andreas Williams
View author publications
You can also search for this author in PubMed Google Scholar
Pavlos Mitsoulis-Ntompos
View author publications
You can also search for this author in PubMed Google Scholar
Damianos Chatziantoniou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICAR-CNR and University of Calabria, Via P. Bucci 41 C, 87036, Rende (CS), Italy
Alfredo Cuzzocrea
Hewlett-Packard Labs, 1501 Page Mill Road, MS 1142, 94304, Palo Alto, CA, USA
Umeshwar Dayal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Williams, A., Mitsoulis-Ntompos, P., Chatziantoniou, D. (2011). Tagged MapReduce: Efficiently Computing Multi-analytics Using MapReduce. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2011. Lecture Notes in Computer Science, vol 6862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23544-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-23544-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23543-6
Online ISBN: 978-3-642-23544-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics