skip to main content
10.1145/3452369.3463820acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper
Open access

Data Science Workflows for the Cloud/Edge Computing Continuum

Published: 18 June 2021 Publication History

Abstract

Research infrastructures play a crucial role in the development of data science. In fact, the conjunction of data, infrastructures and analytical methods enable multidisciplinary scientists and innovators to extract knowledge and to make the knowledge and experiments reusable by the scientific community, innovators providing an impact on science and society. Resources such as data and methods, help domain and data scientists to transform research in an innovation question into a responsible data-driven analytical. On the other hands, Edge computing is a new computing paradigm that is spreading and developing at an incredible pace. Edge computing is based on the assumption that for certain applications is beneficial to bring the computation as much close as possible to data or end-users. This paper introduces an approach for writing data science workflows targeting research infrastructures that encompass resources located at the edge of the network.

References

[1]
Jörn Altmann, Baseem Al-Athwari, Emanuele Carlini, Massimo Coppola, Patrizio Dazzi, Ana Juan Ferrer, Netsanet Haile, Young-Woo Jung, Jamie Marshall, Enric Pages, Evangelos Psomakelis, Ganis Zulfa Santoso, Konstantinos Tserpes, and John Violos. 2017. BASMATI: An Architecture for Managing Cloud and Edge Resources for Mobile Users. In Economics of Grids, Clouds, Systems, and Services, Congduc Pham, Jörn Altmann, and José Ángel Bañares (Eds.). Springer International Publishing, Cham, 56--66.
[2]
Gaetano F Anastasi, Emanuele Carlini, and Patrizio Dazzi. 2013. Smart cloud federation simulations with cloudsim. In Proceedings of the first ACM workshop on Optimization techniques for resources management in clouds. 9--16.
[3]
Ranieri Baraglia, Patrizio Dazzi, Barbara Guidi, and Laura Ricci. 2012. GoDel: Delaunay Overlays in P2P Networks via Gossip. In IEEE 12th International Conference on Peer-to-Peer Computing (P2P). IEEE, 1--12.
[4]
Ranieri Baraglia, Patrizio Dazzi, Matteo Mordacchini, Laura Ricci, and Luca Alessi. 2011. Group: A gossip based building community protocol. In Smart spaces and next generation wired/wireless networking. Springer, Berlin, Heidelberg, 496--507.
[5]
Marcello M. Bersani, Salvatore Distefano, Luca Ferrucci, and Manuel Mazzara. 2015. A Timed Semantics of Workflows. In Software Technologies, Andreas Holzinger, Jorge Cardoso, José Cordeiro, Therese Libourel, Leszek A. Maciaszek, and Marten van Sinderen (Eds.). Springer International Publishing, Cham, 365--383.
[6]
Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Kilian Thiel, and Bernd Wiswedel. 2009. KNIME - the Konstanz Information Miner: Version 2.0 and Beyond. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 26--31. https://doi.org/10.1145/1656274.1656280
[7]
Massimiliano Bertolucci, Emanuele Carlini, Patrizio Dazzi, Alessandro Lulli, and Laura Ricci. 2015. Static and dynamic big data partitioning on apache spark. In PARCO. 489--498.
[8]
Ludäscher Bertram, Altintas Ilkay, Berkley Chad, Higgins Dan, Jaeger Efrat, Jones Matthew, Lee Edward A., Tao Jing, and Zhao Yang. [n.d.]. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18, 10 ([n. d.]), 1039--1065. https://doi.org/10.1002/cpe.994
[9]
Tobias Binz, Uwe Breitenbücher, Oliver Kopp, and Frank Leymann. 2014. TOSCA: portable automated deployment and management of cloud applications. In Advanced Web Services. Springer, 527--549.
[10]
Antonio Brogi, Luca Rinaldi, and Jacopo Soldani. 2018. TosKer: a synergy between TOSCA and Docker for orchestrating multicomponent applications. Software: Practice and Experience 48, 11 (2018), 2061--2079.
[11]
Leonardo Candela, Valerio Grossi, Paolo Manghi, and Roberto Trasarti. 2021. A workflow language for research e-infrastructures. International Journal of Data Science and Analytics (2021). https://doi.org/10.1007/s41060-020-00237-x
[12]
Emanuele Carlini, Massimo Coppola, Patrizio Dazzi, Domenico Laforenza, Susanna Martinelli, and Laura Ricci. 2009. Service and resource discovery supports over p2p overlays. In 2009 International Conference on Ultra Modern Telecommunications & Workshops. IEEE, 1--8.
[13]
Massimo Coppola, Patrizio Dazzi, Aliaksandr Lazouski, Fabio Martinelli, Paolo Mori, Jens Jensen, Ian Johnson, and Philip Kershaw. 2012. The Contrail approach to cloud federations. In Proceedings of the International Symposium on Grids and Clouds (ISGC'12), Vol. 2. 1.
[14]
Gianpaolo Coro, Giancarlo Panichi, Paolo Scarponi, and Pasquale Pagano. 2017. Cloud computing in a distributed e-infrastructure using the web processing service standard. Concurrency and Computation: Practice and Experience 29, 18 (2017), e4219. https://doi.org/10.1002/cpe.4219 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.4219 e4219 cpe.4219.
[15]
Marco Danelutto and Patrizio Dazzi. 2008. Workflows on top of a macro data flow interpreter exploiting aspects. In Making Grids Work. Springer, Boston, MA, 213--224.
[16]
Marco Danelutto, Patrizio Dazzi, et al. 2005. A Java/Jini Framework Supporting Stream Parallel Computations. In PARCO. 681--688.
[17]
Marco Danelutto, P Dazzi, D Laforenza, M Pasin, L Presti, and M Vanneschi. 2006. PAL: High level parallel programming with Java annotations. In Proceedings of CoreGRID Integration Workshop (CIW 2006) Krakow, Poland, Academic Computer Centre CYFRONET AGH. 189--200.
[18]
Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J. Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. 2015. Pegasus, a workflow management system for science automation. Future Generation Computer Systems 46 (2015), 17--35. https: //doi.org/10.1016/j.future.2014.10.008
[19]
R. Filgueira, M. Atkinson, A. Bell, I. Main, S. Boon, C. Kilburn, and P. Meredith. 2014. EScience gateway stimulating collaboration in rock physics and volcanology. Proceedings - IEEE 10th International Conference on eScience, eScience 2014 1 (2014), 187--195. https://doi.org/10.1109/eScience.2014.22
[20]
Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, and Carole Goble. 2014. Common motifs in scientific workflows: An empirical analysis. Future Generation Computer Systems 36 (2014), 338--351. https://doi.org/10. 1016/j.future.2013.09.018 Special Section: Intelligent Big Data Processing Special Section: Behavior Data Security Issues in Network Information Propagation Special Section: Energy-efficiency in Large Distributed Computing Architectures Special Section: eScience Infrastructure and Applications.
[21]
Daniel Garijo, Yolanda Gil, and Oscar Corcho. 2017. Abstract, link, publish, exploit: An end to end framework for workflow sharing. Future Generation Computer Systems 75 (2017), 271--283. https://doi.org/10.1016/j.future.2017.01.008
[22]
Fosca Giannotti, Roberto Trasarti, Kalina Bontcheva, and Valerio Grossi. 2018. SoBigData: Social Mining & Big Data Ecosystem. In Companion of the The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, 437--438.
[23]
Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober. 2020. FAIR Computational Workflows. Data Intelligence 2, 1-2 (2020), 108--121. https://doi.org/10. 1162/dint_a_00033 arXiv:https://doi.org/10.1162/dint_a_00033
[24]
Dongmin Kim, Hanif Muhammad, Eunsam Kim, Sumi Helal, and Choonhwa Lee. 2019. TOSCA-based and federation-aware cloud orchestration for Kubernetes container platform. Applied Sciences 9, 1 (2019), 191.
[25]
Georgia Kougka, Anastasios Gounaris, and Alkis Simitsis. 2018. The many faces of data-centric workflow optimization: a survey. International Journal of Data Science and Analytics 6, 2 (2018), 81--107. https://doi.org/10.1007/s41060-018-0107-0
[26]
Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2018. Fair, Transparent, and Accountable Algorithmic Decision-making Processes: The Premise, the Proposed Solutions, and the Open Challenges. Philosophy & Technology 31 (12 2018). https://doi.org/10.1007/s13347-017-0279-x
[27]
Chee Sun Liew, Malcolm P. Atkinson, Michelle Galea, Tan Fong Ang, Paul Martin, and Jano I. Van Hemert. 2016. Scientific Workflows: Moving Across Paradigms. Comput. Surveys 49, 4 (2016). https://doi.org/10.1145/3012429
[28]
X. Llorà, B. Ács, L. S. Auvil, B. Capitanu, M. E. Welge, and D. E. Goldberg. 2008. Meandre: Semantic-Driven Data-Intensive Flows in the Clouds. In 2008 IEEE Fourth International Conference on eScience. 238--245. https://doi.org/10.1109/ eScience.2008.172
[29]
Alessandro Lulli Lucchese, Laura Ricci, Emanuele Carlini, Patrizio Dazzi, and Claudio. 2015. Cracker: Crumbling Large Graphs Into Connected Components. In 20th IEEE Symposium on Computers and Communication (ISCC) (ISCC2015). IEEE.
[30]
Alessandro Lulli, Emanuele Carlini, Patrizio Dazzi, Claudio Lucchese, and Laura Ricci. 2016. Fast connected components computation in large graphs by vertex pruning. IEEE Transactions on Parallel and Distributed systems 28, 3 (2016), 760--773.
[31]
Suresh Marru, Lahiru Gunathilake, Chathura Herath, Patanachai Tangchaisin, Marlon Pierce, Chris Mattmann, Raminder Singh, Thilina Gunarathne, Eran Chinthaka, Ross Gardler, Aleksander Slominski, Ate Douma, Srinath Perera, and Sanjiva Weerawarana. 2011. Apache Airavata: A Framework for Distributed Applications and Computational Workflows. In Proceedings of the 2011 ACM Workshop on Gateway Computing Environments (Seattle, Washington, USA) (GCE '11). ACM, New York, NY, USA, 21--28. https://doi.org/10.1145/2110486.2110490
[32]
Mirco Nanni, Roberto Trasarti, Anna Monreale, Valerio Grossi, and Dino Pedreschi. 2016. Driving Profiles Computation and Monitoring for Car Insurance CRM. ACM Trans. Intell. Syst. Technol. 8, 1, Article 14 (Aug. 2016), 26 pages. https://doi.org/10.1145/2912148
[33]
David De Roure, Carole Goble, and Robert Stevens. 2009. The design and realisation of the Experimentmy Virtual Research Environment for social sharing of workflows. Future Generation Computer Systems 25, 5 (2009), 561--567. https://doi.org/10.1016/j.future.2008.06.010
[34]
Anit Kumar Sahu, Tian Li, Maziar Sanjabi, Manzil Zaheer, Ameet Talwalkar, and Virginia Smith. 2018. On the convergence of federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127 3 (2018).
[35]
Nalini Schaduangrat, Samuel Lampa, Saw Simeon, Matthew Paul Gleeson, Ola Spjuth, and Chanin Nantasenamat. 2020. Towards reproducible computational drug discovery. Journal of Cheminformatics 12, 1 (2020), 9. https://doi.org/10. 1186/s13321-020-0408-x
[36]
A. Shaon, S. Callaghan, B. Lawrence, B. Matthews, A. Woolf, T. Osborn, and C. Harpham. 2011. A Linked Data Approach to Publishing Complex Scientific Workflows. In 2011 IEEE Seventh International Conference on eScience. 303--310. https://doi.org/10.1109/eScience.2011.49
[37]
Leonid Teytelman, Alexei Stoliartchouk, Lori Kindler, and Bonnie L. Hurwitz. 2016. Protocols.io: Virtual Communities for Protocol Development and Discussion. PLOS Biology 14, 8 (08 2016), 1--6. https://doi.org/10.1371/journal.pbio. 1002538
[38]
John Violos, Vinicius Monteiro de Lira, Patrizio Dazzi, Jörn Altmann, Baseem AlAthwari, Antonia Schwichtenberg, Young-Woo Jung, Theodora Varvarigou, and Konstantinos Tserpes. 2017. User behavior and application modeling in decentralized edge cloud infrastructures. In International Conference on the Economics of Grids, Clouds, Systems, and Services. Springer, Cham, 193--203.
[39]
Zhenyu Wen, Renyu Yang, Peter Garraghan, Tao Lin, Jie Xu, and Michael Rovatsos. 2017. Fog orchestration for internet of things services. IEEE Internet Computing 21, 2 (2017), 16--24.
[40]
Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble. 2013. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Research 41, W1 (2013), W557--W561. https://doi.org/10.1093/nar/gkt328
[41]
Chan-Hyun Youn, Min Chen, and Patrizio Dazzi. 2017. Cloud Broker and Cloudlet for Workflow Scheduling. Springer.
[42]
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde. 2007. Swift: Fast, Reliable, Loosely Coupled Parallel Computation. In IEEE Congress on Services (Services 2007). 199--206. https: //doi.org/10.1109/SERVICES.2007.63

Cited By

View all
  • (2023)Infrastructure Manager: A TOSCA-Based Orchestrator for the Computing ContinuumJournal of Grid Computing10.1007/s10723-023-09686-721:3Online publication date: 14-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FRAME '21: Proceedings of the 1st Workshop on Flexible Resource and Application Management on the Edge
June 2021
53 pages
ISBN:9781450383844
DOI:10.1145/3452369
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data science
  2. edge computing
  3. research infrastructure
  4. workflow languages

Qualifiers

  • Short-paper

Funding Sources

  • European Community's H2020 Program under the scheme "INFRAIA-01-2018-2019 " Integrating Activities for Advanced Communities"

Conference

HPDC '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)15
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Infrastructure Manager: A TOSCA-Based Orchestrator for the Computing ContinuumJournal of Grid Computing10.1007/s10723-023-09686-721:3Online publication date: 14-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media