research-article

Seed-and-vote based in-memory accelerator for DNA read mapping

Authors:
Ann Franchesca Laguna

University of Notre Dame

University of Notre Dame
View Profile

,
Hasindu Gamaarachchi

University of New South Wales

University of New South Wales
View Profile

,
Xunzhao Yin

Zhejiang University

Zhejiang University
View Profile

,
Michael Niemier

University of Notre Dame

University of Notre Dame
View Profile

,
Sri Parameswaran

University of New South Wales

University of New South Wales
View Profile

,
X. Sharon Hu

University of Notre Dame

University of Notre Dame
View Profile

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided DesignNovember 2020Article No.: 56Pages 1–9https://doi.org/10.1145/3400302.3415651

Published:17 December 2020Publication History

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

Pages 1–9

ABSTRACT

Genome analysis is becoming more important in the fields of forensic science, medicine, and history. Sequencing technologies such as High Throughput Sequencing (HTS) and Third Generation Sequencing (TGS) have greatly accelerated genome sequencing. However, genome read mapping remains significantly slower than sequencing. Because of the enormous amount of data needed, the speed of the data transfer between the memory and the processing unit limits the execution speed. In-memory computing can help address the memory-bandwidth bottleneck by minimizing data transfers. Ternary Content Addressable Memories (TCAMs) have been used in accelerators because of their fast searching capability for seed-and-extend, a popular read mapping approach. Seed-and-vote, another read mapping approach, is faster than the seed-and-extend approach but has lower accuracies when used with very short reads. Since sequencing technology is moving to longer reads, the seed-and-vote approach is becoming more viable. We propose a genome read mapping accelerator that uses approximate TCAM to execute the Fast Seed and Vote algorithm (FSVA) that can map both short and long reads. We achieved 400X acceleration compared to the seed-and-extend approach BWA-MEM on a CPU and 115X acceleration at 30X energy improvement compared to state-of-the-art in-memory accelerator using the seed-and-extend approach at 98.75% accuracy for 100bp reads.

References

[n. d.]. An Introduction to Next-Generation Sequencing Technology, howpublished = https://www.illumina.com/Documents/products/Illumina_Sequencing_Introduction.pdf, note = Accessed: 2020-03-30.Google Scholar
[n. d.]. Human Genome ERR168836. ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/.. Accessed: 2020-03-30.Google Scholar
Donald Adjeroh, Timothy Bell, and Amar Mukherjee. 2008. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching (1 ed.). Springer Publishing Company, Incorporated.Google Scholar
Mohammed Alser, Hasan Hassan, Hongyi Xin, Oguz Ergin, Onur Mutlu, and Can Alkan. 2017. GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics (Oxford, England) 33, 21 (01 Nov 2017), 3355--3363. ]. Google ScholarCross Ref
Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403 -- 410. Google ScholarCross Ref
Shanika L. Amarasinghe, Shian Su, Xueyi Dong, Luke Zappia, Matthew E. Ritchie, and Quentin Gouil. 2020. Opportunities and challenges in long-read sequencing data analysis. Genome Biology 21, 1 (2020), 30. Google ScholarCross Ref
Raja Appuswamy, Jacques Fellay, and Nimisha Chaturvedi. 2018. Sequence Alignment Through the Looking Glass. bioRxiv (2018). arXiv:https://www.biorxiv.org/content/early/2018/04/11/256859.full.pdf Google ScholarCross Ref
Sam Behjati and Patrick S. Tarpey. 2013. What is next generation sequencing? Archives of disease in childhood. Education and practice edition 98, 6 (Dec 2013), 236--238. 23986538[pmid]. Google ScholarCross Ref
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In OSDI'04: Sixth Symposium on Operating System Design and Implementation. San Francisco, CA, 137--150.Google ScholarDigital Library
S. Gupta, M. Imani, B. Khaleghi, V. Kumar, and T. Rosing. 2019. RAPID: A ReRAM Processing in-Memory Architecture for DNA Sequence Alignment. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1--6.Google Scholar
James Gurtowski, Michael C. Schatz, and Ben Langmead. 2012. Genotyping in the cloud with Crossbow. Current protocols in bioinformatics Chapter 15 (Sep 2012), Unit15.3--Unit15.3. ]. Google ScholarCross Ref
J. Healy and D. Chambers. 2014. Approximate k-Mer Matching Using Fuzzy Hash Maps. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11, 1 (2014), 258--264.Google ScholarDigital Library
W. Huangfu, S. Li, X. Hu, and Y. Xie. 2018. RADAR: A 3D-ReRAM based DNA Alignment Accelerator Architecture. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). 1--6.Google Scholar
Mohsen Imani, Shruti Patil, and Tajana S Rosing. 2016. MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing. In DATE. IEEE, 373--378.Google Scholar
Roman Kaplan, Leonid Yavits, and Ran Ginosar. 2018. RASSA: Resistive Pre-Alignment Accelerator for Approximate DNA Long Read Mapping. arXiv:qbio.GN/1809.01127Google Scholar
Roman Kaplan, Leonid Yavits, and Ran Ginosar. 2019. BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data. CoRR abs/1901.05959 (2019). arXiv:1901.05959 http://arxiv.org/abs/1901.05959Google Scholar
S. Karen Khatamifard, Zamshed Chowdhury, Nakul Pande, Meisam Razaviyayn, Chris Kim, and Ulya R. Karpuzcu. 2017. A Non-volatile Near-Memory Read Mapping Accelerator. arXiv e-prints, Article arXiv:1709.02381 (Sep 2017), arXiv:1709.02381 pages. arXiv:cs.DC/1709.02381Google Scholar
Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu. 2018. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies. BMC Genomics 19, S2 (May 2018). Google ScholarCross Ref
Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, 3 (2009), R25. Google ScholarCross Ref
Heng Li and Nils Homer. 2010. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in bioinformatics 11, 5 (Sep 2010), 473--483. 20460430[pmid]. Google ScholarCross Ref
Heng Li, Jue Ruan, and Richard Durbin. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome research 18, 11 (Nov 2008), 1851--1858. 18714091[pmid]. Google ScholarCross Ref
J. Li, R. K. Montoye, M. Ishii, and L. Chang. 2014. 1 Mb 0.41 μm2 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing. IEEE Journal of Solid-State Circuits 49, 4 (2014), 896--907.Google ScholarCross Ref
S. Li, L. Liu, Peng Gu, C. Xu, and Yuan Xie. 2016. NVSim-CAM: A circuit-level simulator for emerging nonvolatile memory based Content-Addressable Memory. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1--7.Google ScholarDigital Library
Yang Liao, Gordon K. Smyth, and Wei Shi. 2013. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic acids research 41, 10 (01 May 2013), e108--e108. 23558742[pmid]. Google ScholarCross Ref
C. Lin, J. Hung, W. Lin, C. Lo, Y. Chiang, H. Tsai, G. Yang, Y. King, C. J. Lin, T. Chen, and M. Chang. 2016. 7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell. In 2016 IEEE International Solid-State Circuits Conference (ISSCC). 136--137.Google Scholar
Song Liu, Yi Wang, and Fei Wang. 2016. A fast read alignment method based on seed-and-vote for next generation sequencing. BMC Bioinformatics 17, 17 (2016), 466. Google ScholarCross Ref
Y. Liu and B. Schmidt. 2014. CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing. IEEE Design Test 31, 1 (2014), 31--39.Google ScholarCross Ref
Dianne I. Lou, Jeffrey A. Hussmann, Ross M. McBee, Ashley Acevedo, Raul Andino, William H. Press, and Sara L. Sawyer. 2013. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proceedings of the National Academy of Sciences 110, 49 (2013), 19872--19877. arXiv:https://www.pnas.org/content/110/49/19872.full.pdf Google ScholarCross Ref
Ruibang Luo, Thomas Wong, Jianqiao Zhu, Chi-Man Liu, Xiaoqian Zhu, Edward Wu, Lap-Kei Lee, Haoxiang Lin, Wenjuan Zhu, David W. Cheung, Hing-Fung Ting, Siu-Ming Yiu, Shaoliang Peng, Chang Yu, Yingrui Li, Ruiqiang Li, and Tak-Wah Lam. 2013. SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner. PloS one 8, 5 (31 May 2013), e65632--e65632. 23741504[pmid]. Google ScholarCross Ref
Shoun Matsunaga, Akira Katsumata, Masanori Natsui, Tetsuo Endoh, Hideo Ohno, and Takahiro Hanyu. 2012. Design of a Nine-Transistor/Two-Magnetic-Tunnel-Junction-Cell-Based Low-Energy Nonvolatile Ternary Content-Addressable Memory. Japanese Journal of Applied Physics 51, 2 (feb 2012), 02BM06. Google ScholarCross Ref
Saul B. Needleman and Christian D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 3 (1970), 443 -- 453. Google ScholarCross Ref
K. Pagiamtzis and A. Sheikholeslami. 2006. Content-addressable memory (CAM) circuits and architectures: a tutorial and survey. IEEE Journal of Solid-State Circuits 41, 3 (2006), 712--727.Google ScholarCross Ref
Michael C. Schatz. 2009. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25, 11 (04 2009), 1363--1369. arXiv:https://academic.oup.com/bioinformatics/article-pdf/25/11/1363/950981/btp236.pdf Google ScholarDigital Library
Sophie Schbath, Véronique Martin, Matthias Zytnicki, Julien Fayolle, Valentin Loux, and Jean-François Gibrat. 2012. Mapping reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis. Journal of computational biology : A Journal of Computational Molecular Cell Biology 19, 6 (Jun 2012), 796--813. 22506536[pmid]. Google ScholarCross Ref
T.F. Smith and M.S. Waterman. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147, 1 (1981), 195 -- 197. Google ScholarCross Ref
Yatish Turakhia, Kevin Jie Zheng, Gill Bejerano, and William J. Dally. 2017. Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment. bioRxiv (2017). arXiv:https://www.biorxiv.org/content/early/2017/01/24/092171.full.pdf Google ScholarCross Ref
Ryan R. Wick, Louise M. Judd, and Kathryn E. Holt. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biology 20, 1 (2019), 129. Google ScholarCross Ref
Yuan Xie. 2013. Emerging Memory Technologies: Design, Architecture, and Applications. Springer Publishing Company, Incorporated.Google Scholar
Hongyi Xin, John Greth, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan, and Onur Mutlu. 2015. Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Bioinformatics (Oxford, England) 31, 10 (15 May 2015), 1553--1560. 25577434[pmid]. Google ScholarCross Ref
X. Yin, K. Ni, D. Reis, S. Datta, M. Niemier, and X. S. Hu. 2019. An Ultra-Dense 2FeFET TCAM Design Based on a Multi-Domain FeFET Model. IEEE Transactions on Circuits and Systems II: Express Briefs 66, 9 (2019), 1577--1581.Google ScholarCross Ref
Xunzhao Yin, Michael Niemier, and X Sharon Hu. 2017. Design and benchmarking of ferroelectric FET based TCAM. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 1448--1453.Google ScholarCross Ref

Index Terms

Seed-and-vote based in-memory accelerator for DNA read mapping

Recommendations

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

Motivation: Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient ...
Read More
GeNVoM: Read Mapping Near Non-Volatile Memory
DNA sequencing is the physical/biochemical process of identifying the location of the four bases (Adenine, Guanine, Cytosine, Thymine) in a DNA strand. As semiconductor technology revolutionized computing, modern DNA sequencing technology (termed Next ...
Read More
Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
November 2020
1396 pages
ISBN:9781450380263
DOI:10.1145/3400302
General Chair:
Yuan Xie
Univ. of California, Santa Barbara, CA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 December 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DNA read mapping
ferroelectric FET
seed-and-vote
ternary content addressable memories
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate457of1,762submissions,26%
Upcoming Conference
ICCAD '24

Sponsor:

sigda

IEEE/ACM International Conference on Computer-Aided Design

October 27 - 31, 2024

New York , NY , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 146
  Total Downloads
- Downloads (Last 12 months)74
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Seed-and-vote based in-memory accelerator for DNA read mapping

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

ABSTRACT

References

Cited By

Index Terms

Recommendations

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

GeNVoM: Read Mapping Near Non-Volatile Memory

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Seed-and-vote based in-memory accelerator for DNA read mapping

ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design

ABSTRACT

References

Cited By

Index Terms

Recommendations

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

GeNVoM: Read Mapping Near Non-Volatile Memory

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media