ABSTRACT
National Research and Innovation Agency (BRIN)- Indonesia, hosts high performance computing (HPC) facilities to support research and innovation that need high computation resources. One example of a research area is bioinformatics. As sequencing technology advances, any lab with next generation sequencing (NGS) access can generate a huge amount of data in a very short time. However, the difficulties then have shifted to the data analysis step that follows. It usually requires significant computation resources, many specific tools that need to be chained together, and man resources that are familiar with command line syntax. In addition, the chaining of multiple tools into a comprehensive workflow is also difficult since one needs to understand both the computer system administration and biological information related to the problems they try to answer. These hinder the biologist to take advantage of sequencing technology for their research. In this technical report, we described our approaches to integrate Galaxy and BRIN HPC, to ease users to deploy their analysis workflow on BRIN HPC facility.
Supplemental Material
- Enis Afgan, Dannon Baker, Marius Van den Beek, Daniel Blankenberg, Dave Bouvier, Martin Čech, John Chilton, Dave Clements, Nate Coraor, Carl Eberhard, 2016. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic acids research 44, W1 (2016), W3–W10.Google Scholar
- El Mustapha Bahassi and Peter J Stambrook. 2014. Next-generation sequencing technologies: breaking the sound barrier of human genetics. Mutagenesis 29, 5 (2014), 303–310.Google ScholarCross Ref
- Daniel Blankenberg, Gregory Von Kuster, Nathaniel Coraor, Guruprasad Ananda, Ross Lazarus, Mary Mangan, Anton Nekrutenko, and James Taylor. 2010. Galaxy: a web-based genome analysis tool for experimentalists. Current protocols in molecular biology 89, 1 (2010), 19–10.Google Scholar
- Peter JA Cock, Björn A Grüning, Konrad Paszkiewicz, and Leighton Pritchard. 2013. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ 1(2013), e167.Google ScholarCross Ref
- Jeremy Goecks, Anton Nekrutenko, and James Taylor. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11, 8 (2010), 1–13.Google Scholar
- Björn Grüning, John Chilton, Johannes Köster, Ryan Dale, Nicola Soranzo, Marius van den Beek, Jeremy Goecks, Rolf Backofen, Anton Nekrutenko, and James Taylor. 2018. Practical computational reproducibility in the life sciences. Cell systems 6, 6 (2018), 631–635.Google Scholar
- Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, and Thomas C Schulthess. 2015. STELLA: A domain-specific tool for structured grid methods in weather and climate models. In Proceedings of the international conference for high performance computing, networking, storage and analysis. 1–12.Google ScholarDigital Library
- Pratik D Jagtap, James E Johnson, Getiria Onsongo, Fredrik W Sadler, Kevin Murray, Yuanbo Wang, Gloria M Shenykman, Sricharan Bandhakavi, Lloyd M Smith, and Timothy J Griffin. 2014. Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. Journal of proteome research 13, 12 (2014), 5898–5908.Google ScholarCross Ref
- Sean Kross and Philip J Guo. 2019. Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–14.Google ScholarDigital Library
- Arnida L Latifah, R Kartika Lestari, Inna Syafarina, and Kei Yoshimura. 2020. Sensitivity Experiments of Rainfall to Warm Cloud Auto-Conversion Threshold and Relative Humidity Threshold of Cloudiness in RegCM4. 6 over the Maritime Continent. Atmosphere-Ocean 58, 1 (2020), 1–12.Google ScholarCross Ref
- Phong VV Le, Praveen Kumar, Albert J Valocchi, and Hoang-Vu Dang. 2015. GPU-based high-performance computing for integrated surface–sub-surface flow modeling. Environmental Modelling & Software 73 (2015), 1–13.Google ScholarDigital Library
- Richard M Leggett, Ricardo H Ramirez-Gonzalez, Bernardo J Clavijo, Darren Waite, and Robert P Davey. 2013. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics. Frontiers in genetics 4(2013), 288.Google Scholar
- Wolfgang Maier, Simon Bray, Marius van den Beek, Dave Bouvier, Nathan Coraor, Milad Miladi, Babita Singh, Jordi Rambla De Argila, Dannon Baker, Nathan Roach, 2021. Ready-to-use public infrastructure for global SARS-CoV-2 monitoring. Nature biotechnology 39, 10 (2021), 1178–1179.Google Scholar
- Maria Luiza Mondelli, Thiago Magalhães, Guilherme Loss, Michael Wilde, Ian Foster, Marta Mattoso, Daniel Katz, Helio Barbosa, Ana Tereza R de Vasconcelos, Kary Ocaña, 2018. BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments. PeerJ 6(2018), e5551.Google ScholarCross Ref
- Furqon Hensan Muttaqien, Laksmita Rahadianti, and Arnida L Latifah. 2021. Downscaling for Climate Data in Indonesia Using Image-to-Image Translation Approach. In 2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, 1–8.Google Scholar
- Alexis K Navarre-Sitchler, Reed M Maxwell, Erica R Siirila, Glenn E Hammond, and Peter C Lichtner. 2013. Elucidating geochemical response of shallow heterogeneous aquifers to CO2 leakage using high-performance computing: Implications for monitoring of CO2 sequestration. Advances in Water Resources 53 (2013), 45–55.Google ScholarCross Ref
- Osamu Ogasawara, Yuichi Kodama, Jun Mashima, Takehide Kosuge, and Takatomo Fujisawa. 2020. DDBJ Database updates and computational infrastructure enhancement. Nucleic acids research 48, D1 (2020), D45–D50.Google Scholar
- Eric E Schadt, Michael D Linderman, Jon Sorenson, Lawrence Lee, and Garry P Nolan. 2010. Computational solutions to large-scale data management and analysis. Nature reviews genetics 11, 9 (2010), 647–657.Google Scholar
- Bertil Schmidt and Andreas Hildebrandt. 2017. Next-generation sequencing: big data meets high performance computing. Drug discovery today 22, 4 (2017), 712–717.Google Scholar
- Thamarai Selvi Somasundaram and Kannan Govindarajan. 2014. CLOUDRB: A framework for scheduling and managing High-Performance Computing (HPC) applications in science cloud. Future Generation Computer Systems 34 (2014), 47–65.Google ScholarDigital Library
Index Terms
- Implementation of Workflow Engine on BRIN HPC Infrastructure
Recommendations
The Grid Resource Broker workflow engine
2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007)Increasingly, complex scientific applications are structured in terms of workflows. These applications are usually computationally and-or data intensive and thus are well suited for execution in grid environments. Distributed, geographically spread ...
Scaling Up Bioinformatics Workflows with Dynamic Job Expansion: A Case Study Using Galaxy and Makeflow
E-SCIENCE '15: Proceedings of the 2015 IEEE 11th International Conference on e-ScienceLogical workflow management systems provide a user-friendly portal through which data can be processed using a sequence of standard tools. These logical workflows are a natural way to express the high level intent of the user, and to share the structure ...
A Grid Workflow Process Engine: Architecture and Simulation
IFITA '09: Proceedings of the 2009 International Forum on Information Technology and Applications - Volume 01Grid workflow applications are emerging as a more and more important way to manage and process large data sets, and execute scientific experiments on heterogeneous and distributed Grid environment. Therefore, many efforts have been made towards the ...
Comments