Skip to main content

Cluster Computing, Recursion and Datalog

  • Conference paper
Datalog Reloaded (Datalog 2.0 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6702))

Included in the following conference series:

Abstract

The cluster-computing environment typified by Hadoop, the open-source implementation of map-reduce, is receiving serious attention as the way to execute queries and other operations on very large-scale data. Datalog execution presents several unusual issues for this enviroment. We discuss the best way to execute a round of seminaive evaluation on a computing cluster using the map-reduce. Using transitive closure as an example, we examine the cost of executing recursions in several different ways. Recursive processes such as evaluation of a recursive Datalog program do not fit the key map-reduce assumption that tasks deliver output only when they are completed. As a result, the resilience under compute-node failure that is a key element of the map-reduce framework is not supported for recursive programs. We discuss extensions to this framework that are suitable for executing recursive Datalog programs on very large-scale data in a way that allows progress to continue after node failures, without restarting the entire job.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT (2010)

    Google Scholar 

  2. Al-Kiswany, S., Ripeanu, M., Vazhkudai, S.S., Gharaibeh, A.: stdchk: A checkpoint storage system for desktop grid computing. In: ICDCS, pp. 613–624 (2008)

    Google Scholar 

  3. Alvaro, P., Condie, T., Conway, N., Elmeleegy, K., Hellerstein, J.M., Sears, R.: Boom analytics: exploring data-centric, declarative programming for the cloud. In: EuroSys, pp. 223–236 (2010)

    Google Scholar 

  4. Apache. Hadoop (2006), http://hadoop.apache.org/

  5. Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In: SoCC 2010: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 119–130. ACM, New York (2010)

    Google Scholar 

  6. Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R.: Hyracks: A flexible and extensible foundation for data-intensive computing. In: Proceedings of the IEEE International Conference on Data Engineering (to appear, 2011)

    Google Scholar 

  7. Broder, A.Z., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.L.: Graph structure in the web. Computer Networks 33(1-6), 309–320 (2000)

    Article  Google Scholar 

  8. Bu, Y., Howe, B., Balazinska, M., Ernst, M.: Haloop: efficient iterative data processing on large clusters. In: VLDB Conference (2010)

    Google Scholar 

  9. Dar, S., Ramakrishnan, R.: A performance study of transitive closure algorithms. In: SIGMOD Conference, pp. 454–465 (1994)

    Google Scholar 

  10. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J.F., Royalty, J., Shankar, S., Krioukov, A.: Clustera: an integrated computation and data management system. PVLDB 1(1), 28–41 (2008)

    Google Scholar 

  12. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The complete book (2009)

    Google Scholar 

  13. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: 19th ACM Symposium on Operating Systems Principles (2003)

    Google Scholar 

  14. Hellerstein, J.M.: The declarative imperative: experiences and conjectures in distributed logic. SIGMOD Rec. 39, 1, 5–19 (2010)

    Article  Google Scholar 

  15. Ioannidis, Y.E.: On the computation of the transitive closure of relational operators. In: Proceedings of the 12th International Conference on Very Large Data Bases, VLDB 1986, pp. 403–411. Morgan Kaufmann Publishers Inc., San Francisco (1986)

    Google Scholar 

  16. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys 2007 (2007)

    Google Scholar 

  17. Kabler, R., Ioannidis, Y.E., Carey, M.J.: Performance evaluation of algorithms for transitive closure. Inf. Syst. 17(5), 415–441 (1992)

    Article  MATH  Google Scholar 

  18. Kontogiannis, S.C., Pantziou, G.E., Spirakis, P.G., Yung, M.: Robust parallel computations through randomization. Theory Comput. Syst. 33(5/6), 427–464 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lam, M., et al.: Bdd-based deductive database. bddbddb.sourceforge.net (2008)

    Google Scholar 

  20. Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD Conference (2010)

    Google Scholar 

  21. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD 2010: Proceedings of the 2010 International Conference on Management of Data, pp. 135–146. ACM, New York (2010)

    Google Scholar 

  22. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets (2010)

    Google Scholar 

  23. Seong, S.-W., Nasielski, M., Seo, J., Sengupta, D., Hangal, S., Teh, S.K., Chu, R., Dodson, B., Lam, M.S.: The architecture and implementation of a decentralized social networking platform (2009), http://prpl.stanford.edu/papers/prpl09.pdf

  24. Ullman, J.D.: Principles of Database and Knowledge-Base Systems (1989)

    Google Scholar 

  25. Valduriez, P., Boral, H.: Evaluation of recursive queries using join indices. In: Expert Database Conf., pp. 271–293 (1986)

    Google Scholar 

  26. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, L., Gunda, P.K., Currey, J.: Dryadlinq: A system for general-purpose distributed data-parallel computing using a high-level language. In: Draves, R., van Renesse, R. (eds.) OSDI, pp. 1–14. USENIX Association (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Afrati, F.N., Borkar, V., Carey, M., Polyzotis, N., Ullman, J.D. (2011). Cluster Computing, Recursion and Datalog. In: de Moor, O., Gottlob, G., Furche, T., Sellers, A. (eds) Datalog Reloaded. Datalog 2.0 2010. Lecture Notes in Computer Science, vol 6702. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24206-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24206-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24205-2

  • Online ISBN: 978-3-642-24206-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics