Conferences >2016 IEEE International Confe...

Mining sequential patterns from uncertain big DNA in the spark framework

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Big data has become ubiquitous as high volumes of wide varieties of valuable data of different veracities (e.g., precise, imprecise or uncertain data) are made available ...Show More

Metadata

Abstract:

Big data has become ubiquitous as high volumes of wide varieties of valuable data of different veracities (e.g., precise, imprecise or uncertain data) are made available at a high velocity through fast throughput machines and techniques for data gathering and curation in many real life applications in various domains and application areas such as bioinformatics, biomedicine, finance, social networking, and weather forecasting. In bioinformatics, terabytes of deoxyribonucleic acid (DNA) sequences can now be generated within a few hours with the use of next generation sequencing (NGS) technologies such as Illumina HiSeq X and Illumina Genome Analyzer. Due to the nature of these NGS technologies, generated data are usually inherent with some noise or other forms of error. These uncertain data are embedded with a wealth of information in the form of frequent patterns. Mining frequently occurring patterns (e.g., motifs) from these big uncertain DNA sequences is a challenge in bioinformatics and biomedicine. Many existing algorithms are serial and mine DNA sequence motifs using precise data mining methods. Mining of motifs from big DNA sequences is a computationally intensive task because of the high volume and the associated uncertainty of these DNA sequences. In this paper, we propose a scalable algorithm for high performance computing on bioinformatics. Specifically, our parallel algorithm uses a fault-tolerant collection of resilient distributed datasets (RDDs) in Apache Spark computing framework to mine sequence motifs from uncertain big DNA data. Experimental results show that our algorithm extracts accurate motifs within a short time frame.

Published in: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Date of Conference: 15-18 December 2016

Date Added to IEEE Xplore: 19 January 2017

ISBN Information:

DOI: 10.1109/BIBM.2016.7822641

Conference Location: Shenzhen, China

Contents

References is not available for this document.

Mining sequential patterns from uncertain big DNA in the spark framework

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Mining sequential patterns from uncertain big DNA in the spark framework

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?