skip to main content
research-article

SWARAM: Portable Energy and Cost Efficient Embedded System for Genomic Processing

Published: 08 October 2019 Publication History

Abstract

Treatment of patients using high-quality precision medicine requires a thorough understanding of the genetic composition of a patient. Ideally, the identification of unique variations in an individual’s genome is needed for specifying the necessary treatment. Variant calling workflow is a pipeline of tools, integrating state of the art software systems aimed at alignment, sorting and variant calling for the whole genome sequencing (WGS) data. This pipeline is utilized for identifying unique variations in an individual’s genome (compared to a reference genome). Currently, such a workflow is implemented on high-performance computers (with additional GPUs or FPGAs) or in cloud computers. Such systems are large, have a high cost, and rely on the internet for genome data transfer which makes the system unusable in remote locations unequipped with internet connectivity. It further raises privacy concerns due to processing being carried out in a different facility.
To overcome such limitations, in this paper, for the first time, we present a cost-efficient, offline, scalable, portable, and energy-efficient computing system named SWARAM for variant calling workflow processing. The system uses novel architecture and algorithms to match against partial reference genomes to exploit smaller memory sizes which are typically available in tiny processing systems. Extensive tests on a standard benchmark data-set (NA12878 Illumina platinum genome) confirm that the time consumed for the data transfer and completing variant calling workflow on SWARAM was competitive to that of a 32-core Intel Xeon server with similar accuracy, but costs less than a fifth, and consumes less than 40% of the energy of the server system. The original scripts and code we developed for executing the variant calling workflow on SWARAM are available in the associated Github repository https://github.com/Rammohanty/swaram.

References

[1]
2013. Maxeler Technologies. https://www.maxeler.com/products/mpc-xseries/.
[2]
2019. SWARAM repository. https://github.com/Rammohanty/swaram.
[3]
J. Arram, T. Kaplan, W. Luk, and P. Jiang. 2016. Leveraging FPGAs for accelerating short read alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics / IEEE, ACM 5963, c (2016), 1--10.
[4]
K. Benkrid, Y. Liu, and A. Benkrid. 2009. A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 4 (2009), 561--570.
[5]
P. Brodin, K. Eiglmeier, M. Marmiesse, A. Billault, T. Garnier, S. Niemann, S. Cole, and R. Brosch. 2002. Bacterial artificial chromosome-based comparative genomic analysis identifies Mycobacterium microti as a natural ESAT-6 deletion mutant. Infection and Immunity 70, 10 (2002), 5568--5578.
[6]
N. Chen, T. Chiu, Y. Li, Y. Chien, and Y. Lu. 2015. Power efficient special processor design for burrows-wheeler-transform-based short read sequence alignment. In Biomedical Circuits and Systems Conference (BioCAS), 2015 IEEE. IEEE, 1--4.
[7]
S. Chen and M. A Senar. 2016. Accelerating BWA aligner using multistage data parallelization on multicore and manycore architectures. Procedia Computer Science 80 (2016), 2438--2442.
[8]
J. Cleary, R. Braithwaite, K. Gaastra, B. Hilbush, S. Inglis, S. Irvine, A. Jackson, R. Littin, M. Rathod, D. Ware, et al. 2015. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines. BioRxiv (2015), 023754.
[9]
D. D’Agostino, L. Morganti, E. Corni, D. Cesini, and I. Merelli. 2019. Combining edge and cloud computing for low-power, cost-effective metagenomics analysis. Future Generation Computer Systems 90 (2019), 79--85.
[10]
P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. DePristo, R. Handsaker, G. Lunter, G. Marth, S. Sherry, et al. 2011. The variant call format and VCFtools. Bioinformatics 27, 15 (2011), 2156--2158.
[11]
M. DePristo, E. Banks, R. Poplin, K. Garimella, J. Maguire, C. Hartl, A. Philippakis, G. Del Angel, M. Rivas, M. Hanna, et al. 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 5 (2011), 491--498.
[12]
Y. Erlich and A. Narayanan. 2014. Routes for breaching and protecting genetic privacy.
[13]
F. S. Collins, E. D. Green, A. E. Guttmacher, and M. S. Guyer. 2003. A vision for the future of genomics research. Nature 431, April (2003), 835--847.
[14]
GAIB. 2018. NA12878. Retrieved Apr 19, 2018 from ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/latest/GRCh37.
[15]
S. Gire, A. Goba, K. Andersen, R. Sealfon, D. Park, L. Kanneh, S. Jalloh, M. Momoh, M. Fullah, G. Dudas, et al. 2014. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345, 6202 (2014), 1369--1372.
[16]
V. Gnanasambandapillai, A. Bayat, and S. Parameswaran. 2018. MESGA: An MPSoC based embedded system solution for short read genome alignment. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference. IEEE Press, 52--57.
[17]
Y. Guo, X. Ding, Y. Shen, G. Lyon, and K. Wang. 2015. SeqMule: Automated pipeline for analysis of human exome/genome sequencing data. Scientific Reports 5 (2015), 1--10. http://dx.doi.org/10.1038/srep14283
[18]
C. Herzeel, P. Costanza, T. Ashby, and R. Wuyts. 2013. Performance Analysis of BWA Alignment. Technical Report. Technical Report Exascience Life Lab.
[19]
E. Houtgast, V. Sima, K. Bertels, and Z. Al-Ars. 2016. GPU-accelerated BWA-MEM genomic mapping algorithm using adaptive load balancing. In Architecture of Computing Systems -- ARCS 2016, F. Hannig, J. Cardoso, T. Pionteck, D. Fey, W. Schroder-Preikschat, and J. Teich (Eds.). Springer International Publishing, Cham, 130--142.
[20]
S. Huang, G. Manikandan, A. Ramachandran, K. Rupnow, W. Hwu, and D. Chen. 2017. Hardware acceleration of the pair-HMM algorithm for DNA variant calling. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 275--284.
[21]
Illumina. 2016. MiniSeq System. https://science-docs.illumina.com/documents/Instruments/miniseq-system-spec-sheet-770-2015-039/miniseq-system-spec-sheet-770-2015-039.pdf.
[22]
J. Ivković, A. Veljović, and B. Ranđelović. 2016. ODROID-XU4 as a desktop PC and microcontroller development boards alternative. Technics and Informatics in Education May (2016), 439--444.
[23]
B. Kelly, J. Fitch, Y. Hu, D. Corsmeier, H. Zhong, A. Wetzel, R. Nordquist, D. Newsom, and P. White. 2015. Churchill: An ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biology 16, 1 (2015), 6.
[24]
P. Klus, S. Lam, D. Lyberg, M. Cheung, G. Pullan, I. McFarlane, G. Yeo, and B. Lam. 2012. BarraCUDA-a fast short read sequence aligner using graphics processing units. BMC Research Notes 5, 1 (2012), 27.
[25]
H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25, 16 (2009), 2078--2079.
[26]
H. Li and N. Homer. 2010. A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11, 5 (2010), 473--483.
[27]
Y. Liao, G. Smyth, and W. Shi. 2013. The subread aligner: Fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research 41, 10 (2013), e108--e108.
[28]
C. Liu, T. Wong, E. Wu, R. Luo, S. Yiu, Y. Li, B. Wang, C. Yu, X. Chu, K. Zhao, et al. 2012. SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28, 6 (2012), 878--879.
[29]
Illumina Cambridge Ltd. 2018 (accessed Apr 19, 2018). NA12878. https://www.ebi.ac.uk/ena/data/view/ERR194147.
[30]
R. Luo, Y. Wong, W. Law, L. Lee, J. Cheung, C. Liu, and T. Lam. 2014. BALSA: Integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU. PeerJ 2 (2014), e421.
[31]
A. OD́riscoll, J. Daugelaite, and R. Sleator. 2013. Big data, Hadoop and cloud computing in genomics. Journal of Biomedical Informatics 46, 5 (2013), 774--781.
[32]
C. Olson, M. Kim, C. Clauson, B. Kogon, C. Ebeling, S. Hauck, and W. Ruzzo. 2012. Hardware acceleration of short read mapping. In 2012 IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 161--168.
[33]
World Health Organization et al. 2015. WHO: Ebola Situation Report 11 March 2015.
[34]
S. Pabinger, A. Dander, M. Fischer, R. Snajder, M. Sperk, M. Efremova, B. Krabichler, M. Speicher, J. Zschocke, and Z. Trajanoski. 2014. A survey of tools for variant analysis of next-generation genome sequencing data. Briefings in Bioinformatics 15, 2 (2014), 256--278.
[35]
V. Popic and S. Batzoglou. 2017. A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy. Nature Communications 8 (2017), 15311.
[36]
R. Poplin, V. Ruano-Rubio, M. DePristo, T. Fennell, M. Carneiro, G. der Auwera, D. Kling, L. Gauthier, A. Levy-Moonshine, D. Roazen, and Others. 2017. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv (2017), 201178.
[37]
A. Rimmer, H. Phan, I. Mathieson, Z. Iqbal, S. Twigg, A. Wilkie, G. McVean, G. Lunter, WGS500 Consortium, et al. 2014. Integrating mapping-, assembly-and haplotype-based approaches for calling variants in clinical sequencing applications. Nature Genetics 46, 8 (2014), 912--918.
[38]
S. Sandmann, A. De Graaf, M. Karimi, B. Van Der Reijden, E. Hellström-Lindberg, J. Jansen, and M. Dugas. 2017. Evaluating variant calling tools for non-matched next-generation sequencing data. Scientific Reports 7 (2017), 43169.
[39]
M. Schatz. 2009. CloudBurst: Highly sensitive read mapping with MapReduce. Bioinformatics 25, 11 (2009), 1363--1369.
[40]
M. Schatz, B. Langmead, and S. Salzberg. 2010. Cloud computing and the DNA data race. Nature Biotechnology 28, 7 (2010), 691--693.
[41]
M. Schatz, C. Trapnell, A. Delcher, and A. Varshney. 2007. High-throughput sequence alignment using graphics processing units. BMC Bioinformatics 8, 1 (2007), 474.
[42]
N. Siva. 2008. 1000 Genomes project.
[43]
Zachary D. Stephens, Skylar Y. Lee, Faraz Faghri, Roy H. Campbell, Chengxiang Zhai, Miles J. Efron, Ravishankar Iyer, Michael C. Schatz, Saurabh Sinha, and Gene E. Robinson. 2015. Big data: Astronomical or genomical? PLoS Biology 13, 7 (2015), e1002195.
[44]
Ellen Tsai, Rimma Shakbatyan, Jason Evans, Peter Rossetti, Chet Graham, Himanshu Sharma, Chiao-Feng Lin, and Matthew Lebo. 2016. Bioinformatics workflow for clinical whole genome sequencing at partners healthcare personalized medicine. Journal of Personalized Medicine 6, 1 (2016), 12.
[45]
M. Yang, B. Athey, H. Arabnia, A. Sung, Q. Liu, J. Yang, J. Mao, and Y. Deng. 2009. High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genomics 10, SUPPL. 1 (2009), 1--3.

Cited By

View all
  • (2024)Revolutionizing Genomic Data Management in the Cloud- Novel Approaches for Secure Transfer and Storage of Sequencing VCF Data in the Realm of AI and Cybersecurity - A Perspective2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10724857(1-6)Online publication date: 24-Jun-2024
  • (2022)DMD:DNA alignment in memory constrained device2022 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES54909.2022.00108(497-502)Online publication date: Dec-2022
  • (2022)A Vision for Leveraging the Concept of Digital Twins to Support the Provision of Personalized Cancer CareIEEE Internet Computing10.1109/MIC.2021.306538126:5(17-24)Online publication date: 1-Sep-2022
  • Show More Cited By

Index Terms

  1. SWARAM: Portable Energy and Cost Efficient Embedded System for Genomic Processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 18, Issue 5s
    Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
    October 2019
    1423 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3365919
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 08 October 2019
    Accepted: 01 July 2019
    Revised: 01 June 2019
    Received: 01 April 2019
    Published in TECS Volume 18, Issue 5s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ARM
    2. DNA analysis
    3. Genome
    4. alignment
    5. embedded system
    6. energy efficient system
    7. genetic analysis
    8. portable genome analysis
    9. variant calling

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Revolutionizing Genomic Data Management in the Cloud- Novel Approaches for Secure Transfer and Storage of Sequencing VCF Data in the Realm of AI and Cybersecurity - A Perspective2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10724857(1-6)Online publication date: 24-Jun-2024
    • (2022)DMD:DNA alignment in memory constrained device2022 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES54909.2022.00108(497-502)Online publication date: Dec-2022
    • (2022)A Vision for Leveraging the Concept of Digital Twins to Support the Provision of Personalized Cancer CareIEEE Internet Computing10.1109/MIC.2021.306538126:5(17-24)Online publication date: 1-Sep-2022
    • (2019)Security Vulnerabilities in Applying Decentralized Ledger Systems for Obfuscating Hardwares2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS)10.1109/iSES47678.2019.00067(272-275)Online publication date: Dec-2019

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media