Cluster Computing, Recursion and Datalog

Afrati, Foto N.; Borkar, Vinayak; Carey, Michael; Polyzotis, Neoklis; Ullman, Jeffrey D.

doi:10.1007/978-3-642-24206-9_8

Foto N. Afrati¹⁸,
Vinayak Borkar¹⁹,
Michael Carey¹⁹,
Neoklis Polyzotis²⁰ &
…
Jeffrey D. Ullman²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6702))

Included in the following conference series:

International Datalog 2.0 Workshop

1168 Accesses
5 Citations

Abstract

The cluster-computing environment typified by Hadoop, the open-source implementation of map-reduce, is receiving serious attention as the way to execute queries and other operations on very large-scale data. Datalog execution presents several unusual issues for this enviroment. We discuss the best way to execute a round of seminaive evaluation on a computing cluster using the map-reduce. Using transitive closure as an example, we examine the cost of executing recursions in several different ways. Recursive processes such as evaluation of a recursive Datalog program do not fit the key map-reduce assumption that tasks deliver output only when they are completed. As a result, the resilience under compute-node failure that is a key element of the map-reduce framework is not supported for recursive programs. We discuss extensions to this framework that are suitable for executing recursive Datalog programs on very large-scale data in a way that allows progress to continue after node failures, without restarting the entire job.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Execution of Recursive Queries in Apache Spark

Aura: A Flexible Dataflow Engine for Scalable Data Processing

Map-Reduce Process Algebra: A Formalism to Describe Directed Acyclic Graph Task-Based Jobs in Parallel Environments

References

Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT (2010)
Google Scholar
Al-Kiswany, S., Ripeanu, M., Vazhkudai, S.S., Gharaibeh, A.: stdchk: A checkpoint storage system for desktop grid computing. In: ICDCS, pp. 613–624 (2008)
Google Scholar
Alvaro, P., Condie, T., Conway, N., Elmeleegy, K., Hellerstein, J.M., Sears, R.: Boom analytics: exploring data-centric, declarative programming for the cloud. In: EuroSys, pp. 223–236 (2010)
Google Scholar
Apache. Hadoop (2006), http://hadoop.apache.org/
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In: SoCC 2010: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 119–130. ACM, New York (2010)
Google Scholar
Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R.: Hyracks: A flexible and extensible foundation for data-intensive computing. In: Proceedings of the IEEE International Conference on Data Engineering (to appear, 2011)
Google Scholar
Broder, A.Z., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.L.: Graph structure in the web. Computer Networks 33(1-6), 309–320 (2000)
Article Google Scholar
Bu, Y., Howe, B., Balazinska, M., Ernst, M.: Haloop: efficient iterative data processing on large clusters. In: VLDB Conference (2010)
Google Scholar
Dar, S., Ramakrishnan, R.: A performance study of transitive closure algorithms. In: SIGMOD Conference, pp. 454–465 (1994)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J.F., Royalty, J., Shankar, S., Krioukov, A.: Clustera: an integrated computation and data management system. PVLDB 1(1), 28–41 (2008)
Google Scholar
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The complete book (2009)
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: 19th ACM Symposium on Operating Systems Principles (2003)
Google Scholar
Hellerstein, J.M.: The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Rec. 39, 1, 5–19 (2010)
Article Google Scholar
Ioannidis, Y.E.: On the computation of the transitive closure of relational operators. In: Proceedings of the 12th International Conference on Very Large Data Bases, VLDB 1986, pp. 403–411. Morgan Kaufmann Publishers Inc., San Francisco (1986)
Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys 2007 (2007)
Google Scholar
Kabler, R., Ioannidis, Y.E., Carey, M.J.: Performance evaluation of algorithms for transitive closure. Inf. Syst. 17(5), 415–441 (1992)
Article MATH Google Scholar
Kontogiannis, S.C., Pantziou, G.E., Spirakis, P.G., Yung, M.: Robust parallel computations through randomization. Theory Comput. Syst. 33(5/6), 427–464 (2000)
Article MathSciNet MATH Google Scholar
Lam, M., et al.: Bdd-based deductive database. bddbddb.sourceforge.net (2008)
Google Scholar
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD Conference (2010)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD 2010: Proceedings of the 2010 International Conference on Management of Data, pp. 135–146. ACM, New York (2010)
Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets (2010)
Google Scholar
Seong, S.-W., Nasielski, M., Seo, J., Sengupta, D., Hangal, S., Teh, S.K., Chu, R., Dodson, B., Lam, M.S.: The architecture and implementation of a decentralized social networking platform (2009), http://prpl.stanford.edu/papers/prpl09.pdf
Ullman, J.D.: Principles of Database and Knowledge-Base Systems (1989)
Google Scholar
Valduriez, P., Boral, H.: Evaluation of recursive queries using join indices. In: Expert Database Conf., pp. 271–293 (1986)
Google Scholar
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, L., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: Draves, R., van Renesse, R. (eds.) OSDI, pp. 1–14. USENIX Association (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

National Technical University of Athens, Greece
Foto N. Afrati
UC Irvine, USA
Vinayak Borkar & Michael Carey
UC Santa Cruz, USA
Neoklis Polyzotis
Stanford University, USA
Jeffrey D. Ullman

Authors

Foto N. Afrati
View author publications
You can also search for this author in PubMed Google Scholar
Vinayak Borkar
View author publications
You can also search for this author in PubMed Google Scholar
Michael Carey
View author publications
You can also search for this author in PubMed Google Scholar
Neoklis Polyzotis
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey D. Ullman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Oxford University, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Oege de Moor
Department of Computer Science, University of Oxford, UK
Georg Gottlob , Tim Furche & Andrew Sellers , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afrati, F.N., Borkar, V., Carey, M., Polyzotis, N., Ullman, J.D. (2011). Cluster Computing, Recursion and Datalog. In: de Moor, O., Gottlob, G., Furche, T., Sellers, A. (eds) Datalog Reloaded. Datalog 2.0 2010. Lecture Notes in Computer Science, vol 6702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24206-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-24206-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24205-2
Online ISBN: 978-3-642-24206-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics