Parallelism and Rewriting for Big Data Processing

Spyratos, Nicolas; Sugibuchi, Tsuyoshi

doi:10.1007/978-3-642-40140-4_2

Nicolas Spyratos &
Tsuyoshi Sugibuchi

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 146))

Included in the following conference series:

International Workshop on Information Search, Integration, and Personalization

408 Accesses
1 Citations

Abstract

The so called “big data” is increasingly present in several modern applications, in which massive parallel processing is the main approach in order to achieve acceptable performance. However, as the size of data is ever increasing, even parallelism will meet its limits unless it is combined with other powerful processing techniques. In this paper we propose to combine parallelism with rewriting, that is reusing previous results stored in a cache in order to perform new (parallel) computations. To do this, we introduce an abstract framework based on the lattice of partitions of the data set. Our basic contributions are: (a) showing that our framework allows rewriting of parallel computations (b) deriving the basic principles of optimal cache management and (c) showing that, in case of structured data, our approach can leverage both structure and semantics in data to improve performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Manyika, J., Chui, M., Bughin, J., Brown, B., Dobbs, R., Roxburgh, C., Byers, A.H.: Big Data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute (May 2011)
Google Scholar
Horowitz, M.: Visualizing Big Data: Bar Charts for Words. Wired Magazine 16(7) (June 2008)
Google Scholar
Douglas, L.: The Importance of ‘Big Data’: A Definition. Gartner (June 2012)
Google Scholar
Data, data everywhere. The Economist (February 25, 2010)
Google Scholar
Executive Office of the President: Big Data Across the Federal Government. White House (March 2012)
Google Scholar
Graham, M.: Big data and the end of theory? The Guardian (March 9, 2012)
Google Scholar
Shvetank, S., Horne, A., Capellá, J.: Good Data Won’t Guarantee Good Decisions. Harvard Business Review (September 2012)
Google Scholar
Ohm, P.: Don’t Build a Database of Ruin. Harvard Business Review (August 23, 2012), http://blogs.hbr.org/cs/2012/08/dont_build_a_database_of_ruin.html
Jacobs, A.: The Pathologies of Big Data. ACM Queue (July 6, 2009)
Google Scholar
Monash: eBay’s two enormous data warehouses. DBMS2 (April 30, 2009), http://www.dbms2.com/2009/04/30/
Monash, C.: eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more. DBMS2 (October 6, 2010), http://www.dbms2.com/2010/10/06/
Apache Hadoop project page, http://hadoop.apache.org/
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: 6th Symposium on Operating Systems Design and Implementation, OSDI 2004, Sponsored by USENIX, in Cooperation with ACM SIGOPS, pp. 137–150 (2004)
Google Scholar
DeWitt, D.J., Stonebraker, M.: MapReduce: A major step backwards. Vertica The Database Column (January 17, 2008)
Google Scholar
Bain, T.: Was Stonebraker right? (September 15, 2010), http://blog.tonybain.com/tony_bain/2010/09/was-stonebraker-right.html
Ferrera, P., de Prado, I., Palacios, E., Fernandez-Marquez, J.L., Di Marzo Serugendo, G.: Tuple Map Reduce: Beyond classic MapReduce. In: IEEE Intl. Conf. on Data Mining, ICDM 2012, Brussels (December 2012)
Google Scholar
Spyratos, N.: The Partition Model: A deductive Database Model. ACM Transactions on Database Systems 12(1), 1–37 (1987)
Article Google Scholar

Download references

Authors

Nicolas Spyratos
View author publications
You can also search for this author in PubMed Google Scholar
Tsuyoshi Sugibuchi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Meme Media Laboratory, Hokkaido University, Sapporo, Japan
Yuzuru Tanaka
Laboratoire de Recherche en Informatique, Université Paris-Sud 11, France
Nicolas Spyratos
Hokkaido University, Sapporo, Japan
Tetsuya Yoshida
National Research Council, Istituto di Scienza e Tecnologie dell’Informazione, Pisa, Italy
Carlo Meghini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spyratos, N., Sugibuchi, T. (2013). Parallelism and Rewriting for Big Data Processing. In: Tanaka, Y., Spyratos, N., Yoshida, T., Meghini, C. (eds) Information Search, Integration and Personalization. ISIP 2012. Communications in Computer and Information Science, vol 146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40140-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-40140-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40139-8
Online ISBN: 978-3-642-40140-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics