Skip to main content

Running Multi-relational Data Mining Processes in the Cloud: A Practical Approach for Social Networks

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 565))

Included in the following conference series:

  • 276 Accesses

Abstract

Multi-relational Data Mining algorithms (MRDM) are the appropriate approach for inferring knowledge from databases containing multiple relationships between non-homogenous entities, which are precisely the case of datasets obtained from social networks. However, to acquire such expressivity, the search space of candidate hypotheses in MRDM algorithms is more complex than those obtained from traditional data mining algorithms. To allow a feasible search space of hypotheses, MRDM algorithms adopt several language biases during the mining process. Because of that, when running a MRDM-based system, the user needs to execute the same set of data mining tasks a number of times, each assuming a different combination of parameters in order to get a final good hypothesis. This makes manual control of such complex process tedious, laborious and error-prone. In addition, running the same MRDM process several times consumes much time. Thus, the automatic execution of each setting of parameters throughout parallelization techniques becomes essential. In this paper, we propose an approach named LPFlow4SN that models a MRDM process as a scientific workflow and then executes it in parallel in the cloud, thus benefiting from the existing Scientific Workflow Management Systems. Experimental results reinforce the potential of running parallel scientific workflows in the cloud to automatically control the MRDM process while improving its overall execution performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://aws.amazon.com/.

  2. 2.

    Download SciCumulus at: https://scicumulusc2.wordpress.com/.

  3. 3.

    http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/aleph.html.

References

  1. Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: Proceedings of the 21st International Conference on World Wide Web, pp. 519–528, New York, NY, USA (2012)

    Google Scholar 

  2. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)

    Article  Google Scholar 

  3. Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Elsevier, Amsterdam (2012)

    MATH  Google Scholar 

  4. Bloedorn, E., Christiansen, A.D., Hill, W., Skorupka, C., Talbot, L.M., Tivel, J.: Data Mining for Network Intrusion Detection: How to Get Started (2001)

    Google Scholar 

  5. Dalal, M.A., Harale, N.D.: A survey on clustering in data mining. In: Proceedings of the International Conference & Workshop on Emerging Trends in Technology, pp. 559–562, New York, NY, USA (2011)

    Google Scholar 

  6. Hu, X.: Data mining in bioinformatics: challenges and opportunities. In: Proceeding of the Third International Workshop on Data and Text Mining in Bioinformatics, pp. 1–1, New York, NY, USA (2009)

    Google Scholar 

  7. Džeroski, S., Lavrač, N.: Relational Data Mining. Springer, Berlin, New York (2001)

    Book  MATH  Google Scholar 

  8. Raedt, L.: Logical and relational learning. In: Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence, pp. 1–1. Springer, Berlin, Heidelberg (2008)

    Google Scholar 

  9. Michalski, R.S.: A theory and methodology of inductive learning. Artif. Intell. 20, 111–161 (1983)

    Article  MathSciNet  Google Scholar 

  10. Muggleton, S.: Inductive logic programming. In: 6th International Workshop, ILP-96, Stockholm, Sweden, August 1996, Selected Papers. Springer, New York (1997)

    Google Scholar 

  11. Nilsson, U., Małuszyński, J.: Logic, Programming, and Prolog. Wiley, Chichester, New York (1995)

    MATH  Google Scholar 

  12. Mattoso, M., Werner, C., Travassos, G.H., Braganholo, V., Ogasawara, E., Oliveira, D.D., Cruz, S.M.S.D., Martinho, W., Murta, L.: Towards supporting the life cycle of large scale scientific experiments. Int. J. Bus. Process Integr. Manage. 5(1), 79 (2010)

    Article  Google Scholar 

  13. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)

    Article  Google Scholar 

  14. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids, 1st edn. Springer, Berlin (2007)

    Book  Google Scholar 

  15. Oliveira, D., Baião, F., Mattoso, M.: MiningFlow: adding semantics to text mining workflows. In: First Poster Session of the Brazilian Symposium on Databases, pp. 15–18, João Pessoa, PB, Brazil (2007)

    Google Scholar 

  16. Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–21 (2008)

    Article  Google Scholar 

  17. Buneman, P., Khanna, S., Tan, W.-C.: Why and where: a characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  18. Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: “SciCumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: 3rd International Conference on Cloud Computing, pp. 378–385, Washington, DC, USA (2010)

    Google Scholar 

  19. de Oliveira, D., Ocaña, K.A.C.S., Baião, F., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10(3), 521–552 (2012)

    Article  Google Scholar 

  20. Oliveira, D., Ogasawara, E., Ocaña, K., Baião, F., Mattoso, M.: An adaptive parallel execution strategy for cloud-based scientific workflows. Concurrency Comput. Pract. Experience 24(13), 1531–1550 (2012)

    Article  Google Scholar 

  21. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2004)

    MATH  Google Scholar 

  22. Raicu, I., Foster, I.T., Zhao, Y.: Many-task computing for grids and supercomputers. MTAGS 2008, 1–11 (2008)

    Google Scholar 

  23. Wozniak, J.M., Armstrong, T.G., Wilde, M., Katz, D.S., Lusk, E., Foster, I.T.: Swift/T: large-scale application composition via distributed-memory dataflow processing. In: Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 95–102 (2013)

    Google Scholar 

  24. Deelman, E., Mehta, G., Singh, G., Su, M.-H., Vahi, K.: Pegasus: mapping large-scale workflows to distributed resources. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 376–394. Springer, London (2007)

    Chapter  Google Scholar 

  25. Powers, D.: Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation (2007)

    Google Scholar 

  26. Ogasawara, E., Dias, J., Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: An algebraic approach for data-centric scientific workflows. In: Proceedings of the 37th International Conference on Very Large Data Bases (PVLDB), vol. 4, no. 12, pp. 1328–1339 (2011)

    Google Scholar 

  27. Costa, F., Silva, V., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 282–289, New York, NY, USA (2013)

    Google Scholar 

  28. Ailamaki, A.: Managing scientific data: lessons, challenges, and opportunities. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1045–1046. New York, NY, USA (2011)

    Google Scholar 

  29. Coutinho, R., Drummond, L., Frota, Y., Oliveira, D., Ocaña, K.: Evaluating grasp-based cloud dimensioning for comparative genomics: a practical approach. In: Proceedings of the Second International Workshop on Parallelism in Bioinformatics, Madrid, Spain (2014)

    Google Scholar 

  30. Jackson, K.R., Ramakrishnan, L., Runge, K.J., Thomas, R.C.: Seeking supernovae in the clouds: a performance study. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 421–429, New York, NY, USA (2010)

    Google Scholar 

  31. Popiolek, P.F., Mendizabal, O.M.: Monitoring and analysis of performance impact in virtualized environments. J. Appl. Comput. Res. 2(2), 75–82 (2013)

    Article  Google Scholar 

  32. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inform. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank FAPERJ (grant E-26/111.370/2013) and CNPq (grant 478878/2013-3) for partially sponsoring this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel de Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Paes, A., de Oliveira, D. (2015). Running Multi-relational Data Mining Processes in the Cloud: A Practical Approach for Social Networks. In: Osthoff, C., Navaux, P., Barrios Hernandez, C., Silva Dias, P. (eds) High Performance Computing. CARLA 2015. Communications in Computer and Information Science, vol 565. Springer, Cham. https://doi.org/10.1007/978-3-319-26928-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26928-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26927-6

  • Online ISBN: 978-3-319-26928-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics