skip to main content
research-article

RAiSD-X: A Fast and Accurate FPGA System for the Detection of Positive Selection in Thousands of Genomes

Published:19 December 2019Publication History
Skip Abstract Section

Abstract

Detecting traces of positive selection in genomes carries theoretical significance and has practical applications from shedding light on the forces that drive adaptive evolution to the design of more effective drug treatments. The size of genomic datasets currently grows at an unprecedented pace, fueled by continuous advances in DNA sequencing technologies, leading to ever-increasing compute and memory requirements for meaningful genomic analyses. The majority of existing methods for positive selection detection either are not designed to handle whole genomes or scale poorly with the sample size; they inevitably resort to a runtime versus accuracy tradeoff, raising an alarming concern for the feasibility of future large-scale scans. To this end, we present RAiSD-X, a high-performance system that relies on a decoupled access-execute processing paradigm for efficient FPGA acceleration and couples a novel, to our knowledge, sliding-window algorithm for the recently introduced μ statistic with a mutation-driven hashing technique to rapidly detect patterns in the data. RAiSD-X achieves up to three orders of magnitude faster processing than widely used software implementations, and more importantly, it can exhaustively scan thousands of human chromosomes in minutes, yielding a scalable full-system solution for future studies of positive selection in species of flora and fauna.

References

  1. Nikolaos Alachiotis et al. 2012. OmegaPlus: A scalable tool for rapid detection of selective sweeps in whole-genome datasets. Bioinformatics 28, 17 (2012), 2274--2275.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nikolaos Alachiotis and Pavlos Pavlidis. 2018. RAiSD detects positive selection based on multiple signatures of a selective sweep and SNP vectors. Commun. Biol. 1, 1 (2018), 79.Google ScholarGoogle ScholarCross RefCross Ref
  3. Nikolaos Alachiotis, Thom Popovici, and Tze Meng Low. 2016. Efficient computation of linkage disequilibria as dense linear algebra operations. In Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. IEEE, 418--427.Google ScholarGoogle ScholarCross RefCross Ref
  4. Nikolaos Alachiotis, Charalampos Vatsolakis, Grigorios Chrysos, and Dionisios Pnevmatikatos. 2018. Accelerated inference of positive selection on whole genomes. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  5. Nikolaos Alachiotis and Gabriel Weisz. 2016. High performance linkage disequilibrium: FPGAs hold the key. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, 118--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Md Tauqeer Alam et al. 2011. Selective sweeps and genetic lineages of Plasmodium falciparum drug-resistant alleles in Ghana. J. Infect. Dis. 203, 2 (2011), 220--227.Google ScholarGoogle ScholarCross RefCross Ref
  7. Dimitrios Bozikas et al. 2017. Deploying FPGAs to future-proof genome-wide analyses based on linkage disequilibrium. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL’17). IEEE, 1--4.Google ScholarGoogle Scholar
  8. J. M. Braverman et al. 1995. The hitchhiking effect on the site frequency spectrum of DNA polymorphisms. Genetics 140, 2 (Jun. 1995), 783--96.Google ScholarGoogle Scholar
  9. Christopher C. Chang, Carson C. Chow, Laurent C. A. M. Tellier, Shashaank Vattikuti, Shaun M. Purcell, and James J. Lee. 2015. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4, 1 (2015).Google ScholarGoogle Scholar
  10. George Charitopoulos, Charalampos Vatsolakis, Grigorios Chrysos, and Dionisios Pnevmatikatos. 2018. A decoupled access-execute architecture for reconfigurable accelerators. In Proceedings of the Computing Frontiers Conference. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tao Chen and G Edward Suh. 2016. Efficient data supply for hardware accelerators with prefetching and access/execute decoupling. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jessica L. Crisci, Yu-Ping Poh, Shivani Mahajan, and Jeffrey D. Jensen. 2013. The impact of equilibrium assumptions on tests of selection. Front. Genet. 4 (2013).Google ScholarGoogle Scholar
  13. Natasja G. De Groot and Ronald E. Bontrop. 2013. The HIV-1 pandemic: Does the selective sweep in chimpanzees mirror humankind’s future? Retrovirology 10, 1 (2013), 53.Google ScholarGoogle ScholarCross RefCross Ref
  14. Michael DeGiorgio et al. 2016. Sweepfinder2: Increased sensitivity, robustness and flexibility. Bioinformatics 32, 12 (2016), 1895--1897.Google ScholarGoogle ScholarCross RefCross Ref
  15. F. Depaulis and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Molec. Biol. Evol. 15, 12 (Dec. 1998), 1788--1790.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. C. Fay and C. I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155, 3 (Jul. 2000), 1405--13.Google ScholarGoogle Scholar
  17. Tom Feist. 2012. Vivado design suite. Xilinx, Inc., White Paper (2012), 30.Google ScholarGoogle Scholar
  18. P. Robbe, N. Popitsch, S. J. L. Knight, et al. 2018. Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project. Genet Med 20 (2018), 1196--1205. DOI:10.1038/gim.2017.241Google ScholarGoogle ScholarCross RefCross Ref
  19. Phillip B. Gibbons and Srikanta Tirthapura. 2002. Distributed streams algorithms for sliding windows. In Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, 63--72.Google ScholarGoogle Scholar
  20. Sharon R. Grossman, Ilya Shylakhter, Elinor K. Karlsson, Elizabeth H. Byrne, Shannon Morales, Gabriel Frieden, Elizabeth Hostetter, Elaine Angelino, Manuel Garber, Or Zuk, et al. 2010. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327, 5967 (2010), 883--886.Google ScholarGoogle Scholar
  21. W. G. Hill and Alan Robertson. 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38, 6 (1968), 226--231.Google ScholarGoogle ScholarCross RefCross Ref
  22. Richard R. Hudson. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 2 (2002), 337--8.Google ScholarGoogle ScholarCross RefCross Ref
  23. Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. Hbm (high bandwidth memory) dram technology and architecture. In Proceedings of the 2017 IEEE International Memory Workshop (IMW’17). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  24. Yuseob Kim and Rasmus Nielsen. 2004. Linkage disequilibrium as a signature of selective sweeps. Genetics 167, 3 (Jul. 2004), 1513--1524. DOI:https://doi.org/10.1534/genetics.103.025387Google ScholarGoogle ScholarCross RefCross Ref
  25. Motoo Kimura. 1969. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61, 4 (1969), 893.Google ScholarGoogle ScholarCross RefCross Ref
  26. R. C. Lewontin. 1964. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49, 1 (1964), 49.Google ScholarGoogle Scholar
  27. Heng Li et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25, 16 (2009), 2078--2079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Atabak Mahram and Martin C. Herbordt. 2015. NCBI BLASTP on high-performance reconfigurable computing systems. ACM Trans. Reconfig. Technol. Syst. 7, 4 (2015), 33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Anna-Sapfo Malaspinas. 2016. Methods to characterize selective sweeps using time serial samples: An ancient DNA perspective. Molec. Ecol. 25, 1 (2016), 24--41.Google ScholarGoogle ScholarCross RefCross Ref
  30. J. Maynard Smith and J. Haigh. 1974. The hitch-hiking effect of a favourable gene. Gen. Res. 23, 1 (Feb. 1974), 23--35.Google ScholarGoogle Scholar
  31. Aaron McKenna et al. 2010. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Gen. Res. 20, 9 (2010), 1297--1303.Google ScholarGoogle ScholarCross RefCross Ref
  32. Rasmus Nielsen et al. 2005. Genomic scans for selective sweeps using SNP data. Gen. Res. 15, 11 (Nov. 2005), 1566--1575. DOI:https://doi.org/10.1101/gr.4252305Google ScholarGoogle Scholar
  33. Rasmus Nielsen, Joshua S. Paul, Anders Albrechtsen, and Yun S. Song. 2011. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 6 (2011), 443.Google ScholarGoogle ScholarCross RefCross Ref
  34. Rasmus Nielsen, Scott Williamson, Yuseob Kim, Melissa J. Hubisz, Andrew G. Clark, and Carlos Bustamante. 2005. Genomic scans for selective sweeps using SNP data. Gen. Res. 15, 11 (Nov. 2005), 1566--75. DOI:https://doi.org/10.1101/gr.4252305Google ScholarGoogle ScholarCross RefCross Ref
  35. Tomoko Ohta. 1996. The neutral theory is dead. The current significance and standing of neutral and nearly neutral theories. BioEssays 18, 8 (1996), 673--677.Google ScholarGoogle ScholarCross RefCross Ref
  36. Pavlos Pavlidis et al. 2013. SweeD: Likelihood-based detection of selective sweeps in thousands of genomes. Molec. Biol. Evol. 30, 9 (2013), 2224--2234.Google ScholarGoogle ScholarCross RefCross Ref
  37. Pavlos Pavlidis and Nikolaos Alachiotis. 2017. A survey of methods and tools to detect recent and strong positive selection. J. Biol. Res. 24, 1 (2017), 7.Google ScholarGoogle Scholar
  38. Pavlos Pavlidis, Jeffrey D. Jensen, and Wolfgang Stephan. 2010. Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations. Genetics 185, 3 (Jul. 2010), 907--22. DOI:https://doi.org/10.1534/genetics.110.116459Google ScholarGoogle ScholarCross RefCross Ref
  39. John E. Pool, Ines Hellmann, Jeffrey D. Jensen, and Rasmus Nielsen. 2010. Population genetic inference from genomic sequence variation. Gen. Res. 20, 3 (2010), 291--300.Google ScholarGoogle ScholarCross RefCross Ref
  40. Carlos Reaño, Javier Prades, and Federico Silla. 2018. Exploring the use of remote GPU virtualization in low-power systems for bioinformatics applications. In Proceedings of the 47th International Conference on Parallel Processing Companion. ACM, 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. David Salomon. 2004. Data Compression: The Complete Reference. Springer Science 8 Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Daniel R. Schrider and Andrew D. Kern. 2016. S/HIC: Robust identification of soft and hard sweeps using machine learning. PLos Genet. 12, 3 (2016), e1005928.Google ScholarGoogle ScholarCross RefCross Ref
  43. Stephan C. Schuster. 2007. Next-generation sequencing transforms today’s biology. Nat. Methods 5, 1 (2007), 16.Google ScholarGoogle ScholarCross RefCross Ref
  44. James E. Smith. 1982. Decoupled access/execute computer architectures. In ACM SIGARCH Computer Architecture News, Vol. 10. IEEE Computer Society Press, 112--119.Google ScholarGoogle Scholar
  45. Peter H. Sudmant et al. 2015. An integrated map of structural variation in 2,504 human genomes. Nature 526, 7571 (2015), 75.Google ScholarGoogle Scholar
  46. A Surendar. 2017. FPGA based parallel computation techniques for bioinformatics applications. Int. J. Res. Pharm. Sci. 8, 2 (2017), 124--128.Google ScholarGoogle Scholar
  47. F. Tajima. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 3 (Nov. 1989), 585--595.Google ScholarGoogle Scholar
  48. Simon Tavaré. 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17, 2 (1986), 57--86.Google ScholarGoogle Scholar
  49. B. Sharat Chandra Varma, Kolin Paul, and M Balakrishnan. 2016. Architecture Exploration of FPGA Based Accelerators for BioInformatics Applications. Springer.Google ScholarGoogle Scholar
  50. Anuradha Welivita, Indika Perera, and Dulani Meedeniya. 2017. An interactive workflow generator to support bioinformatics analysis through GPU acceleration. In Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM’17). IEEE, 457--462.Google ScholarGoogle ScholarCross RefCross Ref
  51. Lars Wienbrandt, Jan Christian Kässens, Matthias Hübenthal, and David Ellinghaus. 2019. 1000× faster than PLINK: Combined FPGA and GPU accelerators for logistic regression-based detection of epistasis. J. Comput. Sci. 30 (2019), 183--193.Google ScholarGoogle ScholarCross RefCross Ref
  52. Xilinx. [n.d.]. Vivado Design Suite: User Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_3/ug910-vivado-getting-started.pdf.Google ScholarGoogle Scholar
  53. Xilinx. [n.d.]. Vivado Design Suite, High Level Synthesis: User Guide. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_3/ug902-vivado-high-level-synthesis.pdf.Google ScholarGoogle Scholar
  54. Duo Xu et al. 2017. Archaic hominin introgression in Africa contributes to functional salivary MUC7 genetic variation. Molec. Biol. Evol. 34, 10 (2017), 2704--2715.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. RAiSD-X: A Fast and Accurate FPGA System for the Detection of Positive Selection in Thousands of Genomes

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Reconfigurable Technology and Systems
                ACM Transactions on Reconfigurable Technology and Systems  Volume 13, Issue 1
                March 2020
                135 pages
                ISSN:1936-7406
                EISSN:1936-7414
                DOI:10.1145/3377289
                • Editor:
                • Deming Chen
                Issue’s Table of Contents

                Copyright © 2019 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 19 December 2019
                • Accepted: 1 September 2019
                • Revised: 1 June 2019
                • Received: 1 December 2018
                Published in trets Volume 13, Issue 1

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed
              • Article Metrics

                • Downloads (Last 12 months)31
                • Downloads (Last 6 weeks)5

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format .

              View HTML Format