Conferences >2016 IEEE 32nd International ...

Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recent advances in DNA sequencing have enabled a flood of sequencing-based applications for studying biology and medicine. A key requirement of these applications is to r...Show More

Metadata

Abstract:

Recent advances in DNA sequencing have enabled a flood of sequencing-based applications for studying biology and medicine. A key requirement of these applications is to rapidly and accurately map DNA subsequences to a reference genome. This DNA subsequence mapping problem shares core technical challenges with the similarity query processing problem studied in the database research literature. To solve this problem, existing techniques first extract signatures from a query, then retrieve candidate mapping positions from an index using the extracted signatures, and finally verify the candidate positions. The efficiency of these techniques depends critically on signatures selected from queries, while signature selection relies on an indexing scheme of a reference genome. The q-gram inverted indexing, one of the most widely used indexing schemes, can discover candidate positions quickly, but has the limitation that signatures of queries are restricted to fixed-length q-grams. To address the problem, we propose a flexible way to generate variable-length signatures using a fixed-length q-gram index. The proposed technique groups a few q-grams into a variable-length signature, and generates candidate positions for the variable-length signature using the inverted lists of the q-grams. We also propose a novel dynamic programming algorithm to balance between the filtering power of signatures and the overhead of generating candidate positions for the signatures. Through extensive experiments on both simulated and real genomic data, we show that our technique substantially improves the performance of read mapping in terms of both mapping speed and accuracy.

Published in: 2016 IEEE 32nd International Conference on Data Engineering (ICDE)

Date of Conference: 16-20 May 2016

Date Added to IEEE Xplore: 23 June 2016

Electronic ISBN:978-1-5090-2020-1

DOI: 10.1109/ICDE.2016.7498238

Conference Location: Helsinki, Finland

Contents

References is not available for this document.

Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?