Conferences >2016 IEEE International Confe...

Accelerating large-scale genomic analysis with Spark

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

High-throughput next-generation sequencing technologies are producing a flood of cheap genomic information, providing precision medicine with the opportunity to better un...Show More

Metadata

Abstract:

High-throughput next-generation sequencing technologies are producing a flood of cheap genomic information, providing precision medicine with the opportunity to better understand the primary cause of complicated diseases like cancer. However, even current state-of-the-art approaches still have large gaps with data generation due to limited scalability, accuracy and computational efficiency. To explore how to efficiently and effectively synthesize genomic data into knowledge, we propose GATK-Spark, a balanced parallelization approach that implements an in-memory version of GATK using Apache Spark. First, we performed a rigorous analysis of current GATK optimization strategies. We identify that compute resource utilization, text-based data format and long time single-thread file cutting and mergence operations are three major scalable bottlenecks. Second, we share our experiences designing a new approach optimized for GATK with big-data computing frameworks Apache Spark - GATK-Spark, which reduces the original execution of 20 hours to 30 minutes with a speedup in excess of 37 at 256 CPU cores. This work will facilitate the understanding of genomics analytics pipeline and design of strategies for accelerating large scale genomic analysis applications.

Published in: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Date of Conference: 15-18 December 2016

Date Added to IEEE Xplore: 19 January 2017

ISBN Information:

DOI: 10.1109/BIBM.2016.7822614

Conference Location: Shenzhen

Contents

References is not available for this document.

Accelerating large-scale genomic analysis with Spark

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Accelerating large-scale genomic analysis with Spark

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?