Skip to main content

Advertisement

Log in

Effective Identification and Annotation of Fungal Genomes

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In the past few decades, the dangers of mycosis have caused widespread concern. With the development of the sequencing technology, the effective analysis of fungal sequencing data has become a hotspot. With the gradual increase of fungal sequencing data, there is now a lack of sufficient approaches for the identification and functional annotation of fungal chromosomal genomes. To overcome this challenge, this paper firstly deals with the approaches of the identification and annotation of fungal genomes based on short and long reads sequenced by using multiple platforms such as Illumina and Pacbio. Then this paper develops an automated bioinformatics pipeline called PFGI for the identification and annotation task. The experimental evaluation on a real-world dataset ENA (European Nucleotide Archive) shows that PFGI provides a user-friendly way to perform fungal identification and annotation based on the sequencing data analysis, and could provide accurate analyzing results, accurate to the species level (97% sequence identity).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Desprez-Loustau M L, Robin C, Buée M, Courtecuisse R, Garbaye J, Suffert F, Sache I, Rizzo D M. The fungal dimension of biological invasions. Trends in Ecology & Evolution, 2007, 22(9): 472-480. https://doi.org/10.1016/j.tree.2007.04.005.

  2. Schuster S C. Next-generation sequencing transforms today’s biology. Nature Methods, 2008, 5(1): 16-18. https://doi.org/10.1038/nmeth1156.

    Article  MathSciNet  Google Scholar 

  3. van Dijk E L, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation sequencing technology. Trends in Genetics, 2014, 30(9): 418-426. https://doi.org/10.1016/j.tig.2014.07.001.

    Article  Google Scholar 

  4. van Dijk E L, Jaszczyszyn Y, Naquin D, Thermes C. The third revolution in sequencing technology. Trends in Genetics, 2018, 34(9): 666-681. https://doi.org/10.1016/j.tig.2018.05.008.

    Article  Google Scholar 

  5. Dannemiller K C, Reeves D, Bibby K, Yamamoto N, Peccia J. Fungal high-throughput taxonomic identification tool for use with next-generation sequencing (FHiTINGS). Journal of Basic Microbiology, 2014, 54(4): 315-321. https://doi.org/10.1002/jobm.201200507.

    Article  Google Scholar 

  6. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden T L. BLAST+: Architecture and applications. BMC Bioinformatics, 2009, 10(1): Article No. 421. https://doi.org/10.1186/1471-2105-10-421.

  7. Gweon H S, Oliver A, Taylor J, Booth T, Gibbs M, Read D S, Griffiths R I, Schonrogge K. PIPITS: An automated pipeline for analyses of fungal internal transcribed spacer sequences from the I llumina sequencing platform. Methods in Ecology and Evolution, 2015, 6(8): 973-980. https://doi.org/10.1111/2041-210X.12399.

    Article  Google Scholar 

  8. Eng A, Verster A J, Borenstein E. Meta-LAFFA: A flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline. BMC Bioinformatics, 2020, 21(1): Article No. 471. https://doi.org/10.1186/s12859-020-03815-9.

  9. Clarke E L, Taylor L J, Zhao C, Connell A, Lee J J, Fett B, Bushman F D, Bittinger K. Sunbeam: An extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome, 2019, 7(1): Article No. 46. https://doi.org/10.1186/s40168-019-0658-x.

  10. Rhoads A, Au K F. PacBio sequencing and its applications. Genomics, Proteomics & Bioinformatics, 2015, 13(5): 278-289. https://doi.org/10.1016/j.gpb.2015.08.002.

  11. Seemann T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics, 2014, 30(14): 2068-2069. https://doi.org/10.1093/bioinformatics/btu153.

    Article  Google Scholar 

  12. Jolley K A, Maiden M C. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics, 2010, 11(1): Article No. 595. https://doi.org/10.1186/1471-2105-11-595.

  13. Chen S, Zhou Y, Chen Y, Gu J. FASTQ: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 2018, 34(17): i884-i890. https://doi.org/10.1093/bioinformatics/bty560.

    Article  Google Scholar 

  14. Bolger A M, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30(15): 2114-2120. https://doi.org/10.1093/bioinformatics/btu170.

    Article  Google Scholar 

  15. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal, 2011, 17(1): 10-12. https://doi.org/10.14806/ej.17.1.200.

  16. Benson D A, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman D J, Ostell J, Sayers E W. GenBank. Nucleic Acids Research, 2012, 41(D1): D36-D42. https://doi.org/10.1093/nar/gks1195.

  17. Li D, Liu C M, Luo R, Sadakane K, Lam T W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 2015, 31(10): 1674-1676. https://doi.org/10.1093/bioinformatics/btv033.

    Article  Google Scholar 

  18. Zerbino D R, Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 2008, 18(5): 821-829. https://doi.org/10.1101/gr.074492.107.

    Article  Google Scholar 

  19. Bankevich A, Nurk S, Antipov D et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology, 2012, 19(5): 455-477. https://doi.org/10.1089/cmb.2012.0021.

    Article  MathSciNet  Google Scholar 

  20. Koren S, Walenz B P, Berlin K, Miller J R, Bergman N H, Phillippy A M. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 2017, 27(5): 722-736. https://doi.org/10.1101/gr.215087.116.

    Article  Google Scholar 

  21. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics, 2013, 29(8): 1072-1075. https://doi.org/10.1093/bioinformatics/btt086.

    Article  Google Scholar 

  22. Cock P J, Antao T, Chang J T et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11): 1422-1423. https://doi.org/10.1093/bioinformatics/btp163.

    Article  Google Scholar 

  23. Rowe W P. When the levee breaks: A practical guide to sketching algorithms for processing the flood of genomic data. Genome Biology, 2019, 20(1): Article No. 199. https://doi.org/10.1186/s13059-019-1809-x.

  24. Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 2018, 34(18): 3094-3100. https://doi.org/10.1093/bioinformatics/bty191.

    Article  Google Scholar 

  25. Kanz C, Aldebert P, Althorpe N et al. The EMBL nucleotide sequence database. Nucleic Acids Research, 2005, 33(suppl_1): D29-D33. https://doi.org/10.1093/nar/gki098.

  26. Cornish-Bowden A. Nomenclature for incompletely specified bases in nucleic acid sequences: Recommendations 1984. Nucleic Acids Research, 1985, 13(9): 3021-3030. https://doi.org/10.1093/nar/13.9.3021.

    Article  Google Scholar 

  27. Caboche S, Even G, Loywick A, Audebert C, Hot D. MICRA: An automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data. Genome Biology, 2017, 18(1): Article No. 233. https://doi.org/10.1186/s13059-017-1367-z.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Liu.

Supplementary Information

ESM 1

(PDF 378 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Sun, JL. & Liu, YZ. Effective Identification and Annotation of Fungal Genomes. J. Comput. Sci. Technol. 36, 248–260 (2021). https://doi.org/10.1007/s11390-021-0856-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-0856-4

Keywords

Navigation