Skip to main content

Parallel Algorithms for Multirelational Data Mining: Application to Life Science Problems

  • Chapter
  • First Online:
Resource Management for Big Data Platforms

Abstract

Data Mining (DM) algorithms are able to construct models from available data that can be very useful for both business and science. However, a powerful representation language is required to express the highly complex models that stem from structured data. Multirelational algorithms can then take advantage of this representation for both data and models. The drawback is that for very large or highly complex domains multirelational algorithms may require long running times. This problem can be substantially reduced using parallel implementations. In this chapter, we present a survey on parallel approaches to run Inductive Logic Programming (ILP), a flavor of multirelational algorithms. We also analyze different scheduling approaches for those implementations and describe two applications where the proposed approaches may be very useful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We refer to John Loyd’s book [30] for basic concepts and definitions of Logic Programming.

  2. 2.

    A partial order is a reflexive, antisymmetric, and transitive binary relation.

  3. 3.

    Counting the number of examples derivable from the hypothesis and the background knowledge.

  4. 4.

    As many as the size of the sample

  5. 5.

    Propositional level algorithms

  6. 6.

    or a sheet of a spreadsheet

  7. 7.

    An approach where the activity of a compound, for example, is predicted only based on its structure features.

  8. 8.

    A detailed description of this study can be found in [66]

  9. 9.

    Source data for both data sets is available from the Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network from the US Environmental Protection Agency http://www.epa.gov/ncct/dsstox/index.html, accessed Dec 2008.

  10. 10.

    Available from the Oxford University Machine Learning repository http://www.cs.ox.ac.uk/activities/machlearn/applications.html

  11. 11.

    Except for the carcinogenesis data set

References

  1. Alves, A., Camacho, R., Oliveira, E.: Discovery of functional relationships in multi-relational data using inductive logic programming. In: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 Nov 2004, Brighton, UK, pp. 319–322 (2004)

    Google Scholar 

  2. EC Amazon: Amazon elastic compute cloud (amazon ec2), 2010. https://aws.amazon.com/ec2/

  3. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)

    Google Scholar 

  4. Blaták J., Popelínský, L.: dRAP: A framework for distributed mining firts-order frequent patterns. In: Proceedings of the 16th Conference on Inductive Logic Programming, pp. 25–27. Springer, (2006)

    Google Scholar 

  5. Bone, P., Somogyi, Z., Schachte, P.: Estimating the overlap between dependent computations for automatic parallelization. TPLP 11(4–5), 575–591 (2011)

    Google Scholar 

  6. Bratko, I., Muggleton, S., Varsek, A.: Learning qualitative models of dynamic systems. In: Proceedings of the Eighth International Machine Learning Workshop, San Mateo, Ca, 1991. Morgan-Kaufmann

    Google Scholar 

  7. Buntine, W.: Generalised subsumption and its applications to induction and redundancy. Artif. Intell. J. 36(2):149–176 (1988). revised version of the paper that won the A.I. Best Paper Award at ECAI-86

    Google Scholar 

  8. Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw.: Pract. Experience 41(1):23–50 (2011)

    Google Scholar 

  9. Clare, A., King, R.D.: Data mining the yeast genome in a lazy functional language. In: Proceedings of the Fifth International Symposium on Practical Aspects of Declarative Languages, pp. 19–36 (2003)

    Google Scholar 

  10. Costa, V.S., de Castro Dutra, I., Rocha, R.: Threads and or-parallelism unified. TPLP 10(4–6), 417–432 (2010)

    Google Scholar 

  11. Dasgupta, K., Mandal, B., Dutta, P., Kumar Mandal, J., Dam, S.: A genetic algorithm (ga) based load balancing strategy for cloud computing. Proc. Technol. 10, 340–347 (2013)

    Google Scholar 

  12. Jeffrey, D., Sanjay, G.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Google Scholar 

  13. Dehaspe, L., De Raedt, L.: Parallel inductive logic programming. In: Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases (1995)

    Google Scholar 

  14. Delgado, J., Salah Eddin, A., Adjouadi, M., Sadjadi, S.M.: Paravirtualization for scientific computing: performance analysis and prediction. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC), pp. 536–543. IEEE (2011)

    Google Scholar 

  15. Fayyad, U.M., Uthurusamy, R., (eds.) In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, August 20-21, 1995. AAAI Press (1995)

    Google Scholar 

  16. Nuno, A., Ashwin Srinivasan, F., Silva, F.M.A., Camacho, R.: Parallel ilp for distributed-memory architectures. Mach. Learn. 74(3), 257–279 (2009)

    Google Scholar 

  17. Message Passing Interface Forum: MPI: A message-passing interface standard. Technical Report UT-CS-94-230, University of Tennessee, Knoxville, TN, USA (1994)

    Google Scholar 

  18. Ian Foster and Carl Kesselman: The Grid 2: Blueprint for a new computing infrastructure. Elsevier (2003)

    Google Scholar 

  19. Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I.: Above the clouds: A berkeley view of cloud computing. Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS 28:13 (2009)

    Google Scholar 

  20. Garnett, M.J., Edelman, E.J., Heidorn, S.J., Greenman, C.D., Dastur, A., Lau, K.W., Patricia Greninger, I., Thompson, R., Luo, X., Soares, J., et al.: Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483(7391), 570–575 (2012)

    Article  Google Scholar 

  21. Cloud Google: Google cloud platform. https://cloud.google.com/

  22. Graham, J., Page, D., Kamal, A.: Accelerating the drug design process through parallel inductive logic programming data mining. In: Proceeding of the Computational Systems Bioinformatics (CSB’03). IEEE (2003)

    Google Scholar 

  23. Gupta, G., Pontelli, E., Ali, K.A.M., Carlsson, M., Hermenegildo, M.V.: Parallel execution of prolog programs: a survey. ACM Trans. Program. Lang. Syst. 23(4), 472–602 (2001)

    Google Scholar 

  24. Huang, W., Liu, J., Abali, B., Panda, D.K.: A case for high performance computing with virtual machines. In: Proceedings of the 20th annual international conference on Supercomputing, pp. 125–134. ACM (2006)

    Google Scholar 

  25. Juve, G., Deelman, E.: Scientific workflows and clouds. Crossroads 16(3), 14–18 (2010)

    Article  Google Scholar 

  26. King, R., Muggleton S., Lewis, R., Sternberg, M.: Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. National Acad. Sci. 89(23) (1992)

    Google Scholar 

  27. King, R., Sternberg, M.J.E.: A machine learning approach for the prediction of protein secondary structure. J. Mol. Biol. 216, 441–457 (1990)

    Article  Google Scholar 

  28. Konstantopoulos, S.K.: A data-parallel version of Aleph. In: Proceedings of the Workshop on Parallel and Distributed Computing for Machine Learning, co-located with ECML/PKDD’2003, Dubrovnik, Croatia, 2003

    Google Scholar 

  29. Krishna, P.V.: Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl. Soft Comput. 13(5), 2292–2303 (2013)

    Google Scholar 

  30. Lloyd, J.W.: Foundations of Logic Programming. Springer, New York (1984)

    Book  MATH  Google Scholar 

  31. Martinez-Angeles, C.A., de Castro Dutra, I., Costa, V.S., Buenabad-Chavez, J.: A datalog engine for gpus. In: Declarative Programming and Knowledge Management - Declarative Programming Days, KDPD 2013, Unifying INAP, WFLP, and WLP, Kiel, Germany, September 11-13, 2013, Revised Selected Papers, vol. 8439 of Lecture Notes in Computer Science, pp. 152–168. Springer (2013)

    Google Scholar 

  32. Matsui, T., Inuzuka, N., Seki, H., Itoh, H.: Comparison of three parallel implementations of an induction algorithm. In: 8th International Parallel Computing Workshop, pp. 181–188. Singapore (1998)

    Google Scholar 

  33. Mauch, Viktor, Kunze, Marcel, Hillenbrand, Marius: High performance cloud computing. Future Gen. Comput. Syst. 29(6), 1408–1416 (2013)

    Article  Google Scholar 

  34. Menden, M.P., Iorio, F., Garnett, M., McDermott, U., Benes, C.H., Ballester, P.J., Saez-Rodriguez, J.: Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PloS one 8(4), e61318 (2013)

    Google Scholar 

  35. Michalski, R.S.: Pattern recognition as rule-guided inductive inference. In: Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 349–361 (1980)

    Google Scholar 

  36. Mitchell, T.M.: Generalization as search. Artificial intell. 18(2), 203–226 (1982)

    Google Scholar 

  37. Muggleton, S.: Inductive logic programming. New Gener. Comput. 8(4), 295–317 (1991)

    Article  MATH  Google Scholar 

  38. Muggleton, S.: Inductive logic programming: derivations, successes and shortcomings. In: Proceedings of the European Conference on Machine Learning: ECML-93, pp. 21–37, Vienna, Austria, April 1993

    Google Scholar 

  39. Muggleton, S.: Inverse entailment and progol. New Gener. Comput. Special issue on Inductive Logic Programming 13(3–4), 245–286 (1995)

    Google Scholar 

  40. Muggleton, S.: Learning from positive data. In: Inductive Logic Programming, 6th International Workshop, ILP-96, Stockholm, Sweden, August 26-28, 1996, Selected Papers, pp. 358–376 (1996)

    Google Scholar 

  41. Muggleton, S., Firth, J.: Relational rule induction with CProgol4.4: a tutorial introduction. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 160–188. Springer (2001)

    Google Scholar 

  42. Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Logic Program. 19(20), 629–679 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  43. Ohwada, H., Mizoguchi, F.: Parallel execution for speeding up inductive logic programming systems. In: Proceedings of the 9th International Workshop on Inductive Logic Programming, number 1721 in LNAI, pp. 277–286. Springer (1999)

    Google Scholar 

  44. Ohwada, H., Nishiyama, H., Mizoguchi, F.: Concurrent execution of optimal hypothesis search for inverse entailment. In: Cussens, J., Frisch, A. (eds.) Proceedings of the 10th International Conference on Inductive Logic Programming, vol. 1866 of LNAI, pp. 165–173. Springer (2000)

    Google Scholar 

  45. Pacini, Elina, Mateos, Cristian, Garino, Carlos García: Distributed job scheduling based on swarm intelligence: a survey. Comput. Electr. Eng. 40(1), 252–269 (2014)

    Article  Google Scholar 

  46. Plotkin, G.D.: A note on inductive generalisation, pp. 153–163. In: Meltzer, B., Michie, D. (eds.) Edinburgh University Press, Edinburgh (1969)

    Google Scholar 

  47. Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A midterm report. In: Brazdil, P. (ed.) Proceedings of the 6th European Conference on Machine Learning, vol. 667, pp. 3–20. Springer (1993)

    Google Scholar 

  48. Ramakrishnan, L., Zbiegel, P.T., Campbell, S., Bradshaw, R., Canon, R.S., Coghlan, S., Sakrejda, I., Desai, N., Declerck, T., Liu, A.: Magellan: experiences from a science cloud. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, pp. 49–58. ACM (2011)

    Google Scholar 

  49. Ramezani, F., Lu, J., Hussain, F.: Task based system load balancing approach in cloud environments. In: Knowledge Engineering and Management, pp. 31–42. Springer (2014)

    Google Scholar 

  50. Ramezani, F., Jie, L., Hussain, F.K.: Task-based system load balancing in cloud computing using particle swarm optimization. Int. J. Parallel Programm. 42(5), 739–754 (2014)

    Google Scholar 

  51. RD1, K., Muggleton, S.H., Srinivasan, A., Sternberg, M.J.: Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc Natl Acad Sci USA, 9(93(1)):438–42 (1996)

    Google Scholar 

  52. Reinaldo, F., Fernandes, C., Rahman, A., Malucelli, A., Camacho, R.: Assessing the eligibility of kidney transplant donors. In: Machine Learning and Data Mining in Pattern Recognition, 6th International Conference, MLDM 2009, Leipzig, Germany, July 23–25, 2009. Proceedings, pp. 802–809 (2009)

    Google Scholar 

  53. Robinson, J.A.: A machine-oriented logic based on the resolution principle. J. ACM 12(1), 23–41 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  54. Vítor, S.C., Ashwin, S., Rui, C., Hendrik, B., Bart, D., Gerda, J., Jan, S., Henk, V., Wim, V.L: Query transformations for improving the efficiency of ILP systems. J. Mach. Learn. Res. 4, 465–491 (2003)

    Google Scholar 

  55. Skillicorn, David B., Wang, Yu.: Parallel and sequential algorithms for data mining using inductive logic. Knowl. Inf. Syst. 3(4), 405–421 (2001)

    Article  MATH  Google Scholar 

  56. Smith, R.G.: The contract net protocol: high-level communication and control in a distributed problem solver. IEEE Trans. Comput. 29(12), 1104–1113 (1980)

    Article  Google Scholar 

  57. Srinivasan, A.: The Aleph Manual, 2003. http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph

  58. Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.E.: Carcinogenesis predictions using ILP. In: Inductive Logic Programming, 7th International Workshop, ILP-97, Prague, Czech Republic, Sept. 17–20, 1997, Proceedings, pp. 273–287 (1997)

    Google Scholar 

  59. Fonseca, N.A., Pereira, M., Santos Costa, V., Camacho, R.: Interactive discriminative mining of chemical fragments. In: Proceedings of the 2010 International Conference on Inductive Logic Programming (ILP 2010), number 6489 in Lecture Notes in Artificial Intelligence, pp. 59–66. Springer (2011)

    Google Scholar 

  60. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (2012)

    Google Scholar 

  61. Wielemaker, J.: Native preemptive threads in SWI-Prolog. In: Palamidessi, C. (ed.) Proceedings of the 19th International Conference on Logic Programming, vol. 2916 of LNAI, pp. 331–345. Springer (2003)

    Google Scholar 

  62. Wirth., R.: Learning by failure to prove. In: Proceedings Third European Working Session on Learning, pp. 237–251. London (1988) Pitman

    Google Scholar 

  63. Woo, Y.T., Lai, D., McLain, J.L., Manibusan, M.K., Dellarco, V.: Use of mechanism-based structure-activity relationships analysis in carcinogenic potential ranking for drinking water disinfection by-products. Environ. Health Perspect 110((Suppl 1)), 75–87 (2002)

    Google Scholar 

  64. Wu, F., Wu, Q., Tan, Y.: Workflow scheduling in cloud: a survey. J. Supercomput. 1–46 (2015)

    Google Scholar 

  65. Xu, Y., Wu, L., Guo, L., Chen, Z., Yang, L., Shi, Z.: An intelligent load balancing algorithm towards efficient cloud computing. In: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)

    Google Scholar 

  66. Zaverucha, G., Santos Costa, V., Paes, A. (eds.) Inductive logic programming—23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers, volume 8812 of Lecture Notes in Computer Science. Springer (2014)

    Google Scholar 

  67. Zhan, Z.H., Fang Liu, X., Jiao Gong, Y., Zhang, J., Shu-Hung Chung, H., Li. Y.: Cloud computing resource scheduling and a survey of its evolutionary approaches. ACM Computing Surveys (CSUR) 47(4), 63 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge G. Barbosa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Camacho, R., Barbosa, J.G., Sampaio, A., Ladeiras, J., Fonseca, N.A., Costa, V.S. (2016). Parallel Algorithms for Multirelational Data Mining: Application to Life Science Problems. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44881-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44880-0

  • Online ISBN: 978-3-319-44881-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics