Skip to main content
Log in

Applications and challenges of high performance computing in genomics

  • Review Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

With the rapid development of high-throughput sequencing technologies, the scale of sequencing data continuously increases at unprecedented speed. In the field of genomics, high performance computing (HPC) is urgently needed to process these large-scale sequencing data, which uses supercomputers and parallel processing technologies to solve complex computing problems and performs intensive computing operations across massive resources. Nowadays, high performance computing plays an important role in data-driven sciences, and is widely used in genomics research. However, while dealing with massive multi-dimensional genomics data using high performance computing, there are still many challenges which limit the wide applications of HPC, such as high data complexity, huge memory requirements and low parallel computing performance. In this paper, we reviewed the irreplaceable applications of high performance computing in genomics, especially in pan-genome, single-cell transcriptome and large-scale population sequencing studies. In future, with the developing methods of hardware acceleration and algorithm optimization, the applications of high performance computing will be more inseparable in complex and large-scale genomics studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Alberti-Servera, L., Muenchow, L., Tsapogas, P., Capoferri, G., Eschbach, K., Beisel, C., Ceredig, R., Ivanek, R., Rolink, A.: Single-cell RNA sequencing reveals developmental heterogeneity among early lymphoid progenitors. EMBO J. 36(24), 3619–3633 (2017). https://doi.org/10.15252/embj.201797105

    Article  Google Scholar 

  • Altshuler, D.L., Durbin, R.M., Abecasis, G.R., Bentley, D.R., Peterson, J.L., et al.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010). https://doi.org/10.1038/nature09534

    Article  Google Scholar 

  • Andor, N., Simonds, E.F., Czerwinski, D.K., Chen, J., Grimes, S.M., Wood-Bouwens, C., Zheng, G.X.Y., Kubit, M.A., Greer, S., Weiss, W.A., Levy, R., Hanlee, P.J.: Single-cell RNA-seq of follicular lymphoma reveals malignant B-cell types and coexpression of T-cell immune checkpoints. Blood 133(10), 1119–1129 (2019). https://doi.org/10.1182/blood-2018-08-862292

    Article  Google Scholar 

  • Baron, M., Veres, A., Wolock, S.L., Faust, A.L., Gaujoux, R., Vetere, A., Ryu, J.H., Wagner, B.K., Shen-Orr, S.S., Klein, A.M., Melton, D.A., Yanai, I.: A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3(4), 346-360.e4 (2016). https://doi.org/10.1016/j.cels.2016.08.011

    Article  Google Scholar 

  • Depristo, M.A., Banks, E., Poplin, R.E., Garimella, K.V., Maguire, J.R., Hartl, C., Rivas, M.A., Hanna, M., Mckenna, A., Fennell, T.J., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Simches, R.B., Population Genetics, Massachusetts General Hospital: A framework for variation discovery and genotyping using nextgeneration DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011). https://doi.org/10.1038/ng.806

    Article  Google Scholar 

  • Dou, S., Wang, Q., Zhang, B., Jiang, H., Chen, S., Qi, X., Duan, H., Yao, Lu., Dong, J., Cao, Y., Xie, L., Zhou, Q., Shi, W.: Molecular identity of human limbal heterogeneity involved in corneal homeostasis and privilege. Ocul. Surf. (2021). https://doi.org/10.1016/j.jtos.2021.04.010

    Article  Google Scholar 

  • Dubow, T., Marjanovic, S.: Population-scale sequencing and the future of genomic medicine: learning from past and present efforts. RAND Eur. (2016). https://doi.org/10.7249/RR1520

    Article  Google Scholar 

  • Dulken, B.W., Buckley, M.T., Negredo, P.N., Saligrama, N., Cayrol, R., Leeman, D.S., George, B.M., Boutet, S.C., Hebestreit, K., Pluvinage, J.V., Wyss-Coray, T., Weissman, I.L., Vogel, H., Davis, M.M., Brunet, A.: Single-cell analysis reveals T cell infiltration in old neurogenic niches. Nature 571(7764), 205–210 (2019). https://doi.org/10.1038/s41586-019-1362-5

    Article  Google Scholar 

  • El Aboudi, N., Benhlima, L.: Big data management for healthcare systems: architecture, requirements, and implementation. Adv. Bioinform. 2018, 1 (2018)

    Article  Google Scholar 

  • Franke, K.R., Crowgey, E.L.: Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for genome analysis toolkit algorithms. Genom. Inform. (2020). https://doi.org/10.5808/GI.2020.18.1.e10

    Article  Google Scholar 

  • Franz, E., Rotariu, O., Lopes, B.S., Macrae, M., Bono, J.L., Laing, C., Gannon, V., Söderlund, R., Van Hoek, A.H.A.M., Friesema, I., French, N.P., George, T., Biggs, P.J., Jaros, P., Rivas, M., Chinen, I., Campos, J., Jernberg, C., Gobius, K., Mellor, G.E., Scott Chandry, P., Perez-Reche, F., Forbes, K.J., Strachan, N.J.C.: Phylogeographic analysis reveals multiple international transmission events have driven the global emergence of Escherichia Coli O157:H7. Clin. Infect. Dis. 69(3), 428–437 (2019). https://doi.org/10.1093/cid/ciy919

    Article  Google Scholar 

  • Frazer, K.A., Ballinger, D.G., Cox, D.R., Hinds, D.A., Stewart, J., et al.: A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164), 851–861 (2007). https://doi.org/10.1038/nature06258

    Article  Google Scholar 

  • Gao, L., Gonda, I., Sun, H., Ma, Q., Bao, K., Tieman, D.M., Burzynski-Chang, E.A., Fish, T.L., Stromberg, K.A., Sacks, G.L., Thannhauser, T.W., Foolad, M.R., Diez, M.J., Blanca, J., Canizares, J., Yimin, Xu., van der Knaap, E., Huang, S., Klee, H.J., Giovannoni, J.J., Fei, Z.: The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51(6), 1044–1051 (2019). https://doi.org/10.1038/s41588-019-0410-2

    Article  Google Scholar 

  • Gaydosik, A.M., Tabib, T., Geskin, L.J., Bayan, C.-A., Conway, J.F., Lafyatis, R., Fuschiotti, P.: Single-cell lymphocyte heterogeneity in advanced cutaneous T-cell lymphoma skin tumors. Clin. Cancer Res. off. J. Am. Assoc. Cancer Res. 25(14), 4443–4454 (2019). https://doi.org/10.1158/1078-0432.CCR-19-0148

    Article  Google Scholar 

  • Hervé, T., Masignani, V., Cieslewicz, M.J., Donati, C., Medini, D., Ward, N.L., Angiuoli, S.V., Crabtree, J., Jones, A.L., Scott Durkin, A., Deboy, R.T., Davidsen, T.M., Mora, M., Scarselli, M., Ros, I.M., Peterson, J.D., Hauser, C.R., Sundaram, J.P., Nelson, W.C., Madupu, R., Brinkac, L.M., Dodson, R.J., Rosovitz, M.J., Sullivan, S.A., Daugherty, S.C., Haft, D.H., Selengut, J., Gwinn, M.L., Zhou, L., Zafar, N., Khouri, H., Radune, D., Dimitrov, G., Watkins, K., O’Connor, K.J.B., Smith, S., Utterback, T.R., White, O., Rubens, C.E., Grandi, G., Madoff, L.C., Kasper, D.L., Telford, J.L., Wessels, M.R., Rappuoli, R., Fraser, C.M.: Genome analysis of multiple pathogenic isolates of streptococcus agalactiae: implications for the microbial “Pan-genome.” Proc. Natl. Acad. Sci. USA. 102(39), 13950–13955 (2005). https://doi.org/10.1073/pnas.0506758102

    Article  Google Scholar 

  • Hirsch, C.N., Foerster, J.M., Johnson, J.M., Sekhon, R.S., Muttoni, G., Vaillancourt, B., Peñagaricano, F., Lindquist, E., Pedraza, M.A., Barry, K., de Leon, N., Kaeppler, S.M., Robin Buell, C.: Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26(1), 121–135 (2014). https://doi.org/10.1105/tpc.113.119982

    Article  Google Scholar 

  • Hong, L., Zhong-hua, L., Xue-bin, C.: The applications and trends of high performance computing in finance. In: 2010 Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, pp. 193–97 (2010). https://doi.org/10.1109/DCABES.2010.45

  • Hübner, S., Bercovich, N., Todesco, M., Mandel, J.R., Odenheimer, J., Ziegler, E., Lee, J.S., Baute, G.J., Owens, G.L., Grassa, C.J., Ebert, D.P., Ostevik, K.L., Moyers, B.T., Yakimowski, S., Masalia, R.R., Gao, L., Ćalić, I., Bowers, J.E., Kane, N.C., Swanevelder, D.Z.H., Kubach, T., Muños, S., Langlade, N.B., Burke, J.M., Rieseberg, L.H.: Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 5(1), 54–62 (2019). https://doi.org/10.1038/s41477-018-0329-0

    Article  Google Scholar 

  • Januszyk, M., Chen, K., Henn, D., Foster, D.S., Borrelli, M.R., Bonham, C.A., Sivaraj, D., Wagh, D., Longaker, M.T., Wan, D.C., Gurtner, G.C.: Characterization of diabetic and non-diabetic foot ulcers using single-cell RNA-sequencing. Micromachines (2020). https://doi.org/10.3390/MI11090815

    Article  Google Scholar 

  • Ji, A.L., Rubin, A.J., Thrane, K., Jiang, S., Reynolds, D.L., Meyers, R.M., Guo, M.G., George, B.M., Mollbrink, A., Bergenstråhle, J., Larsson, L., Bai, Y., Zhu, B., Bhaduri, A., Meyers, J.M., Xavier Rovira-Clavé, S., Hollmig, T., Aasi, S.Z., Nolan, G.P., Lundeberg, J., Khavari, P.A.: Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell (2020). https://doi.org/10.1016/j.cell.2020.05.039

    Article  Google Scholar 

  • Kaplan, N., Wang, J., Wray, B., Patel, P., Yang, W., Peng, H., Lavker, R.M.: Single-cell rna transcriptome helps define the limbal/corneal epithelial stem/early transit amplifying cells and how autophagy affects this population. Invest. Ophthalmol. vis. Sci. 60(10), 3570–3583 (2019). https://doi.org/10.1167/iovs.19-27656

    Article  Google Scholar 

  • Kehr, B., Helgadottir, A., Melsted, P., Jonsson, H., Helgason, H., Jonasdottir, A., Jonasdottir, A., Sigurdsson, A., Gylfason, A., Halldorsson, G.H., Kristmundsdottir, S., Thorgeirsson, G., Olafsson, I., Holm, H., Thorsteinsdottir, U., Sulem, P., Helgason, A., Gudbjartsson, D.F., Halldorsson, B.V., Stefansson, K.: Diversity in non-repetitive human sequences not found in the reference genome. Nat. Genet. 49(4), 588–593 (2017). https://doi.org/10.1038/ng.3801

    Article  Google Scholar 

  • Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Morgan, M.J., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001). https://doi.org/10.1038/35057062

    Article  Google Scholar 

  • Lehne, M., Luijten, S., Vom Felde, P., Imbusch, G., Thun, S.: The use of FHIR in digital health—a review of the scientific literature. Stud. Health Technol. Inform. 267, 52–58 (2019). https://doi.org/10.3233/SHTI190805

    Article  Google Scholar 

  • Li, D.Q., Kim, S., Li, J.M., Gao, Q., Choi, J., Bian, F., Jiaoyue, Hu., Zhang, Y., Li, J., Rong, Lu., Li, Y., Pflugfelder, S.C., Miao, H., Chen, R.: Single-cell transcriptomics identifies limbal stem cell population and cell types mapping its differentiation trajectory in limbal basal epithelium of human cornea. Ocular Surf. 20(8), 20–32 (2021). https://doi.org/10.1016/j.jtos.2020.12.004

    Article  Google Scholar 

  • Litzenburger, U.M., Buenrostro, J.D., Beijing, Wu., Shen, Y., Sheffield, N.C., Kathiria, A., Greenleaf, W.J., Chang, H.Y.: Single-cell epigenomic variability reveals functional cancer heterogeneity. Genome Biol. 18(1), 1–12 (2017). https://doi.org/10.1186/s13059-016-1133-7

    Article  Google Scholar 

  • Liu, Y., Huilong, Du., Li, P., Shen, Y., Peng, H., Liu, S., Zhou, G.A., Zhang, H., Liu, Z., Shi, M., Huang, X., Li, Y., Zhang, M., Wang, Z., Zhu, B., Han, B., Liang, C., Tian, Z.: Pan-genome of wild and cultivated soybeans. Cell 182(1), 162-176.e13 (2020). https://doi.org/10.1016/j.cell.2020.05.023

    Article  Google Scholar 

  • McCray, T., Moline, D., Baumann, B., Vander Griend, D.J., Nonn, L.: Single-cell RNA-seq analysis identifies a putative epithelial stem cell population in human primary prostate cells in monolayer and organoid culture conditions. Am. J. Clin. Exp. Urol. 7(3), 123–138 (2019)

    Google Scholar 

  • Navin, N., Kendall, J., Troge, J., Andrews, P., Rodgers, L., McIndoo, J., Cook, K., Stepansky, A., Levy, D., Esposito, D., Muthuswamy, L., Alex Krasnitz, W., McCombie, R., Hicks, J., Wigler, M.: Tumour evolution inferred by single-cell sequencing. Nature 472(7341), 90–94 (2011). https://doi.org/10.1038/nature09807

    Article  Google Scholar 

  • Osorio, D., Xue, Yu., Peng, Yu., Serpedin, E., Cai, J.J.: Single-cell RNA sequencing of a European and an African lymphoblastoid cell line. Sci. Data. 6(1), 1–8 (2019). https://doi.org/10.1038/s41597-019-0116-4

    Article  Google Scholar 

  • Page, A.J., Cummins, C.A., Hunt, M., Wong, V.K., Reuter, S., Holden, M.T.G., Fookes, M., Falush, D., Keane, J.A., Parkhill, J.: Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 31(22), 3691–3693 (2015). https://doi.org/10.1093/bioinformatics/btv421

    Article  Google Scholar 

  • Petti, A.A., Williams, S.R., Miller, C.A., Fiddes, I.T., Srivatsan, S.N., Chen, D.Y., Fronick, C.C., Fulton, R.S., Church, D.M., Ley, T.J.: A general approach for detecting expressed mutations in aml cells using single Cell RNA-sequencing. Nat. Commun. 10, 1 (2019). https://doi.org/10.1038/s41467-019-11591-1

    Article  Google Scholar 

  • Ren, X., Zheng, L., Zhang, Z.: SSCC: a novel computational framework for rapid and accurate clustering large-scale single cell RNA-seq data. Genomics Proteom. Bioinform. 17(2), 201–210 (2019). https://doi.org/10.1016/j.gpb.2018.10.003

    Article  Google Scholar 

  • Ryu, B., Shin, S.Y., Baek, R.M., Kim, J.W., Heo, E., Kang, I., Yang, J.S.W., Yoo, S.: Clinical genomic sequencing reports in electronic health record systems based on international standards: implementation study. J. Med. Internet Res. 22, 8 (2020). https://doi.org/10.2196/15040

    Article  Google Scholar 

  • Sherman, R.M., Forman, J., Antonescu, V., Puiu, D., Daya, M., Rafaels, N., Boorgula, M.P., Chavan, S., Vergara, C., Ortega, V.E., Levin, A.M., Eng, C., Yazdanbakhsh, M., Wilson, J.G., Marrugo, J., Lange, L.A., Keoki Williams, L., Watson, H., Ware, L.B., Olopade, C.O., Olopade, O., Oliveira, R.R., Ober, C., Nicolae, D.L., Meyers, D.A., Mayorga, A., Knight-Madden, J., Hartert, T., Hansel, N.N., Foreman, M.G., Ford, J.G., Faruque, M.U., Dunston, G.M., Caraballo, L., Burchard, E.G., Bleecker, E.R., Araujo, M.I., Herrera-Paz, E.F., Campbell, M., Foster, C., Taub, M.A., Beaty, T.H., Ruczinski, I., Mathias, R.A., Barnes, K.C., Salzberg, S.L.: Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 51(1), 30–35 (2019). https://doi.org/10.1038/s41588-018-0273-y

    Article  Google Scholar 

  • Simpson, J.T., Durbin, R.: Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26(12), 367–373 (2010). https://doi.org/10.1093/bioinformatics/btq217

    Article  Google Scholar 

  • Skelly, D.A., Squiers, G.T., McLellan, M.A., Bolisetty, M.T., Robson, P., Rosenthal, N.A., Pinto, A.R.: Single-cell transcriptional profiling reveals cellular diversity and intercommunication in the mouse heart. Cell Rep. 22(3), 600–610 (2018). https://doi.org/10.1016/j.celrep.2017.12.072

    Article  Google Scholar 

  • Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big data: astronomical or genomical? PLoS Biol. (2015). https://doi.org/10.1371/journal.pbio.1002195

    Article  Google Scholar 

  • Stucky, A., Sedghizadeh, P.P., Mahabady, S., Chen, X., Zhang, C., Zhang, G., Zhang, X., Zhong, J.F.: Single-cell genomic analysis of head and neck squamous cell carcinoma. Oncotarget 8(42), 73208 (2017)

    Article  Google Scholar 

  • Wang, W., Mauleon, R., Zhiqiang, Hu., Chebotarov, D., Leung, H., et al.: Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557(7703), 43–49 (2018). https://doi.org/10.1038/s41586-018-0063-9

    Article  Google Scholar 

  • Wang, X., Williams, C., Liu, Z.H., Croghan, J.: Big data management challenges in health research - a literature review. Brief. Bioinform. 20(1), 156–167 (2019). https://doi.org/10.1093/bib/bbx086

    Article  Google Scholar 

  • Wang, S.Y., Pershing, S., Lee, A.Y.: Big data requirements for artificial intelligence. Curr. Opin. Ophthalmol. 31(5), 318–323 (2020). https://doi.org/10.1097/ICU.0000000000000676

    Article  Google Scholar 

  • Zeng, J.Y., Yuan, N., Wei, W.J., Li, G., ZhL, Du.: Challenges of high-throughput computing in genomic data analysis for large-scale cohort studies. Front. Data Comput. 2(1), 117–127 (2020). https://doi.org/10.11871/JFDC.ISSN.2096-742X.2020.01.010

    Article  Google Scholar 

  • Zhao, S., Agafonov, O., Azab, A., Stokowy, T., Hovig, E.: Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci. Rep. 10(1), 1–12 (2020). https://doi.org/10.1038/s41598-020-77218-4

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by National Key Research Program of China [2017YFC0907503 to J.X. and 2016YFC0901903 to Z.D.]; Strategic Priority Research Program of the Chinese Academy of Sciences [XDB38030400 to J.X.]; National Natural Science Foundation of China [31970634 and 31771465 to J.X.]; CAS Key Technology Talent Program [to Z.D.].

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenglin Du or Jingfa Xiao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, M., Bu, C., Zeng, J. et al. Applications and challenges of high performance computing in genomics. CCF Trans. HPC 3, 344–352 (2021). https://doi.org/10.1007/s42514-021-00081-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-021-00081-w

Keywords

Navigation