skip to main content
10.1145/3400302.3415651acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Seed-and-vote based in-memory accelerator for DNA read mapping

Published:17 December 2020Publication History

ABSTRACT

Genome analysis is becoming more important in the fields of forensic science, medicine, and history. Sequencing technologies such as High Throughput Sequencing (HTS) and Third Generation Sequencing (TGS) have greatly accelerated genome sequencing. However, genome read mapping remains significantly slower than sequencing. Because of the enormous amount of data needed, the speed of the data transfer between the memory and the processing unit limits the execution speed. In-memory computing can help address the memory-bandwidth bottleneck by minimizing data transfers. Ternary Content Addressable Memories (TCAMs) have been used in accelerators because of their fast searching capability for seed-and-extend, a popular read mapping approach. Seed-and-vote, another read mapping approach, is faster than the seed-and-extend approach but has lower accuracies when used with very short reads. Since sequencing technology is moving to longer reads, the seed-and-vote approach is becoming more viable. We propose a genome read mapping accelerator that uses approximate TCAM to execute the Fast Seed and Vote algorithm (FSVA) that can map both short and long reads. We achieved 400X acceleration compared to the seed-and-extend approach BWA-MEM on a CPU and 115X acceleration at 30X energy improvement compared to state-of-the-art in-memory accelerator using the seed-and-extend approach at 98.75% accuracy for 100bp reads.

References

  1. [n. d.]. An Introduction to Next-Generation Sequencing Technology, howpublished = https://www.illumina.com/Documents/products/Illumina_Sequencing_Introduction.pdf, note = Accessed: 2020-03-30.Google ScholarGoogle Scholar
  2. [n. d.]. Human Genome ERR168836. ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/.. Accessed: 2020-03-30.Google ScholarGoogle Scholar
  3. Donald Adjeroh, Timothy Bell, and Amar Mukherjee. 2008. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching (1 ed.). Springer Publishing Company, Incorporated.Google ScholarGoogle Scholar
  4. Mohammed Alser, Hasan Hassan, Hongyi Xin, Oguz Ergin, Onur Mutlu, and Can Alkan. 2017. GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics (Oxford, England) 33, 21 (01 Nov 2017), 3355--3363. ]. Google ScholarGoogle ScholarCross RefCross Ref
  5. Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403 -- 410. Google ScholarGoogle ScholarCross RefCross Ref
  6. Shanika L. Amarasinghe, Shian Su, Xueyi Dong, Luke Zappia, Matthew E. Ritchie, and Quentin Gouil. 2020. Opportunities and challenges in long-read sequencing data analysis. Genome Biology 21, 1 (2020), 30. Google ScholarGoogle ScholarCross RefCross Ref
  7. Raja Appuswamy, Jacques Fellay, and Nimisha Chaturvedi. 2018. Sequence Alignment Through the Looking Glass. bioRxiv (2018). arXiv:https://www.biorxiv.org/content/early/2018/04/11/256859.full.pdf Google ScholarGoogle ScholarCross RefCross Ref
  8. Sam Behjati and Patrick S. Tarpey. 2013. What is next generation sequencing? Archives of disease in childhood. Education and practice edition 98, 6 (Dec 2013), 236--238. 23986538[pmid]. Google ScholarGoogle ScholarCross RefCross Ref
  9. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In OSDI'04: Sixth Symposium on Operating System Design and Implementation. San Francisco, CA, 137--150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Gupta, M. Imani, B. Khaleghi, V. Kumar, and T. Rosing. 2019. RAPID: A ReRAM Processing in-Memory Architecture for DNA Sequence Alignment. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1--6.Google ScholarGoogle Scholar
  11. James Gurtowski, Michael C. Schatz, and Ben Langmead. 2012. Genotyping in the cloud with Crossbow. Current protocols in bioinformatics Chapter 15 (Sep 2012), Unit15.3--Unit15.3. ]. Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Healy and D. Chambers. 2014. Approximate k-Mer Matching Using Fuzzy Hash Maps. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11, 1 (2014), 258--264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Huangfu, S. Li, X. Hu, and Y. Xie. 2018. RADAR: A 3D-ReRAM based DNA Alignment Accelerator Architecture. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). 1--6.Google ScholarGoogle Scholar
  14. Mohsen Imani, Shruti Patil, and Tajana S Rosing. 2016. MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing. In DATE. IEEE, 373--378.Google ScholarGoogle Scholar
  15. Roman Kaplan, Leonid Yavits, and Ran Ginosar. 2018. RASSA: Resistive Pre-Alignment Accelerator for Approximate DNA Long Read Mapping. arXiv:qbio.GN/1809.01127Google ScholarGoogle Scholar
  16. Roman Kaplan, Leonid Yavits, and Ran Ginosar. 2019. BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data. CoRR abs/1901.05959 (2019). arXiv:1901.05959 http://arxiv.org/abs/1901.05959Google ScholarGoogle Scholar
  17. S. Karen Khatamifard, Zamshed Chowdhury, Nakul Pande, Meisam Razaviyayn, Chris Kim, and Ulya R. Karpuzcu. 2017. A Non-volatile Near-Memory Read Mapping Accelerator. arXiv e-prints, Article arXiv:1709.02381 (Sep 2017), arXiv:1709.02381 pages. arXiv:cs.DC/1709.02381Google ScholarGoogle Scholar
  18. Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu. 2018. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies. BMC Genomics 19, S2 (May 2018). Google ScholarGoogle ScholarCross RefCross Ref
  19. Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, 3 (2009), R25. Google ScholarGoogle ScholarCross RefCross Ref
  20. Heng Li and Nils Homer. 2010. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in bioinformatics 11, 5 (Sep 2010), 473--483. 20460430[pmid]. Google ScholarGoogle ScholarCross RefCross Ref
  21. Heng Li, Jue Ruan, and Richard Durbin. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome research 18, 11 (Nov 2008), 1851--1858. 18714091[pmid]. Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Li, R. K. Montoye, M. Ishii, and L. Chang. 2014. 1 Mb 0.41 μm2 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing. IEEE Journal of Solid-State Circuits 49, 4 (2014), 896--907.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Li, L. Liu, Peng Gu, C. Xu, and Yuan Xie. 2016. NVSim-CAM: A circuit-level simulator for emerging nonvolatile memory based Content-Addressable Memory. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1--7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yang Liao, Gordon K. Smyth, and Wei Shi. 2013. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic acids research 41, 10 (01 May 2013), e108--e108. 23558742[pmid]. Google ScholarGoogle ScholarCross RefCross Ref
  25. C. Lin, J. Hung, W. Lin, C. Lo, Y. Chiang, H. Tsai, G. Yang, Y. King, C. J. Lin, T. Chen, and M. Chang. 2016. 7.4 A 256b-wordlength ReRAM-based TCAM with 1ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5T1R cell. In 2016 IEEE International Solid-State Circuits Conference (ISSCC). 136--137.Google ScholarGoogle Scholar
  26. Song Liu, Yi Wang, and Fei Wang. 2016. A fast read alignment method based on seed-and-vote for next generation sequencing. BMC Bioinformatics 17, 17 (2016), 466. Google ScholarGoogle ScholarCross RefCross Ref
  27. Y. Liu and B. Schmidt. 2014. CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing. IEEE Design Test 31, 1 (2014), 31--39.Google ScholarGoogle ScholarCross RefCross Ref
  28. Dianne I. Lou, Jeffrey A. Hussmann, Ross M. McBee, Ashley Acevedo, Raul Andino, William H. Press, and Sara L. Sawyer. 2013. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proceedings of the National Academy of Sciences 110, 49 (2013), 19872--19877. arXiv:https://www.pnas.org/content/110/49/19872.full.pdf Google ScholarGoogle ScholarCross RefCross Ref
  29. Ruibang Luo, Thomas Wong, Jianqiao Zhu, Chi-Man Liu, Xiaoqian Zhu, Edward Wu, Lap-Kei Lee, Haoxiang Lin, Wenjuan Zhu, David W. Cheung, Hing-Fung Ting, Siu-Ming Yiu, Shaoliang Peng, Chang Yu, Yingrui Li, Ruiqiang Li, and Tak-Wah Lam. 2013. SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner. PloS one 8, 5 (31 May 2013), e65632--e65632. 23741504[pmid]. Google ScholarGoogle ScholarCross RefCross Ref
  30. Shoun Matsunaga, Akira Katsumata, Masanori Natsui, Tetsuo Endoh, Hideo Ohno, and Takahiro Hanyu. 2012. Design of a Nine-Transistor/Two-Magnetic-Tunnel-Junction-Cell-Based Low-Energy Nonvolatile Ternary Content-Addressable Memory. Japanese Journal of Applied Physics 51, 2 (feb 2012), 02BM06. Google ScholarGoogle ScholarCross RefCross Ref
  31. Saul B. Needleman and Christian D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 3 (1970), 443 -- 453. Google ScholarGoogle ScholarCross RefCross Ref
  32. K. Pagiamtzis and A. Sheikholeslami. 2006. Content-addressable memory (CAM) circuits and architectures: a tutorial and survey. IEEE Journal of Solid-State Circuits 41, 3 (2006), 712--727.Google ScholarGoogle ScholarCross RefCross Ref
  33. Michael C. Schatz. 2009. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25, 11 (04 2009), 1363--1369. arXiv:https://academic.oup.com/bioinformatics/article-pdf/25/11/1363/950981/btp236.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sophie Schbath, Véronique Martin, Matthias Zytnicki, Julien Fayolle, Valentin Loux, and Jean-François Gibrat. 2012. Mapping reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis. Journal of computational biology : A Journal of Computational Molecular Cell Biology 19, 6 (Jun 2012), 796--813. 22506536[pmid]. Google ScholarGoogle ScholarCross RefCross Ref
  35. T.F. Smith and M.S. Waterman. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147, 1 (1981), 195 -- 197. Google ScholarGoogle ScholarCross RefCross Ref
  36. Yatish Turakhia, Kevin Jie Zheng, Gill Bejerano, and William J. Dally. 2017. Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment. bioRxiv (2017). arXiv:https://www.biorxiv.org/content/early/2017/01/24/092171.full.pdf Google ScholarGoogle ScholarCross RefCross Ref
  37. Ryan R. Wick, Louise M. Judd, and Kathryn E. Holt. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biology 20, 1 (2019), 129. Google ScholarGoogle ScholarCross RefCross Ref
  38. Yuan Xie. 2013. Emerging Memory Technologies: Design, Architecture, and Applications. Springer Publishing Company, Incorporated.Google ScholarGoogle Scholar
  39. Hongyi Xin, John Greth, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan, and Onur Mutlu. 2015. Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Bioinformatics (Oxford, England) 31, 10 (15 May 2015), 1553--1560. 25577434[pmid]. Google ScholarGoogle ScholarCross RefCross Ref
  40. X. Yin, K. Ni, D. Reis, S. Datta, M. Niemier, and X. S. Hu. 2019. An Ultra-Dense 2FeFET TCAM Design Based on a Multi-Domain FeFET Model. IEEE Transactions on Circuits and Systems II: Express Briefs 66, 9 (2019), 1577--1581.Google ScholarGoogle ScholarCross RefCross Ref
  41. Xunzhao Yin, Michael Niemier, and X Sharon Hu. 2017. Design and benchmarking of ferroelectric FET based TCAM. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 1448--1453.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Seed-and-vote based in-memory accelerator for DNA read mapping

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided Design
            November 2020
            1396 pages
            ISBN:9781450380263
            DOI:10.1145/3400302
            • General Chair:
            • Yuan Xie

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 December 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate457of1,762submissions,26%

            Upcoming Conference

            ICCAD '24
            IEEE/ACM International Conference on Computer-Aided Design
            October 27 - 31, 2024
            New York , NY , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader