skip to main content
10.1145/3473258.3473279acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbtConference Proceedingsconference-collections
research-article

Frigate: a fast, in-memory tool for counting and querying k-mers

Published: 11 December 2021 Publication History

Abstract

K-mer counting is an important step in many bioinformatics applications including genome assembly, sequence error correction, and sequence alignment. As the advancements in next generation sequencing technologies have resulted in tremendous growth of genomic data, it is inevitable for k-mer counters to be faster and more efficient. We present Frigate, a fast and efficient tool capable of counting and querying k-mers. Its in-memory design utilizes multithreaded, lock-free data structures to improve performance. Frigate was developed with the emphasis on values of k less than 20, aiming to maximize performance by employing different algorithms for different ranges of k values. The results show that Frigate achieves comparable or up to 2-3x speedup compared to the state-of-the-art k-mer counters, especially for large datasets.

References

[1]
J. Meng, B. Wang, Y. Wei, S. Feng, and P. Balaji, “SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores,” BMC Bioinformatics, vol. 15, no. Suppl 9, p. S2, 2014.
[2]
J. T. Simpson and R. Durbin, “Efficient de novo assembly of large genomes using compressed data structures,” Genome Res, vol. 22, no. 3, pp. 549–556, Mar. 2012.
[3]
P. E. C. Compeau, P. A. Pevzner, and G. Tesler, “How to apply de Bruijn graphs to genome assembly,” Nat Biotechnol, vol. 29, no. 11, pp. 987–991, Nov. 2011.
[4]
D. R. Kelley, M. C. Schatz, and S. L. Salzberg, “Quake: quality-aware detection and correction of sequencing errors,” Genome Biology, vol. 11, no. 11, p. R116, 2010.
[5]
Y. Liu, J. Schröder, and B. Schmidt, “Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data,” Bioinformatics, vol. 29, no. 3, pp. 308–315, Feb. 2013.
[6]
A. L. Price, N. C. Jones, and P. A. Pevzner, “De novo identification of repeat families in large genomes,” Bioinformatics, vol. 21 Suppl 1, pp. i351-358, Jun. 2005.
[7]
R. C. Edgar, “MUSCLE: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Res, vol. 32, no. 5, pp. 1792–1797, 2004.
[8]
B. Liu, “Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects,” arXiv:1308.2012 [q-bio], Feb. 2020, Accessed: Feb. 01, 2021. [Online]. Available: http://arxiv.org/abs/1308.2012.
[9]
S. Deorowicz, M. Kokot, S. Grabowski, and A. Debudaj-Grabysz, “KMC 2: fast and resource-frugal k-mer counting,” Bioinformatics, vol. 31, no. 10, pp. 1569–1576, May 2015.
[10]
M. Kokot, M. Długosz, and S. Deorowicz, “KMC 3: counting and manipulating k-mer statistics,” Bioinformatics, vol. 33, no. 17, pp. 2759–2761, Sep. 2017.
[11]
S. C. Manekar and S. R. Sathe, “A benchmark study of k-mer counting methods for high-throughput sequencing,” GigaScience, vol. 7, no. 12, p. giy125, 2018.
[12]
G. Marçais and C. Kingsford, “A fast, lock-free approach for efficient parallel counting of occurrences of k-mers,” Bioinformatics, vol. 27, no. 6, pp. 764–770, Mar. 2011.
[13]
M. Erbert, S. Rechner, and M. Müller-Hannemann, “Gerbil: a fast and memory-efficient k-mer counter with GPU-support,” Algorithms Mol Biol, vol. 12, no. 1, p. 9, Dec. 2017.
[14]
Y. Li and XifengYan, “MSPKmerCounter: A Fast and Memory Efficient Approach for K-mer Counting,” arXiv:1505.06550 [cs, q-bio], May 2015, Accessed: Feb. 04, 2021. [Online]. Available: http://arxiv.org/abs/1505.06550.
[15]
J. Wang, S. Chen, L. Dong, and G. Wang, “CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table,” Briefings in Bioinformatics, p. bbaa063, May 2020.
[16]
G. Rizk, D. Lavenier, and R. Chikhi, “DSK: k-mer counting with very low memory usage,” Bioinformatics, vol. 29, no. 5, pp. 652–653, Mar. 2013.
[17]
L. Kaplinski, M. Lepamets, and M. Remm, “GenomeTester4: a toolkit for performing basic set operations - union, intersection and complement on k-mer lists,” Gigascience, vol. 4, p. 58, 2015.
[18]
S. Deorowicz, A. Debudaj-Grabysz, and S. Grabowski, “Disk-based k-mer counting on a PC,” BMC Bioinformatics, vol. 14, no. 1, p. 160, Dec. 2013.
[19]
M. Roberts, W. Hayes, B. R. Hunt, S. M. Mount, and J. A. Yorke, “Reducing storage requirements for biological sequence comparison,” Bioinformatics, vol. 20, no. 18, pp. 3363–3369, Dec. 2004.
[20]
H. Li, A. Ramachandran, and D. Chen, “GPU Acceleration of Advanced k-mer Counting for Computational Genomics,” in 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Milan, Jul. 2018, pp. 1–4.
[21]
N. Mcvicar, C.-C. Lin, and S. Hauck, “K-Mer Counting Using Bloom Filters with an FPGA-Attached HMC,” in 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA, Apr. 2017, pp. 203–210.
[22]
J. Meena, S. Sze, U. Chand, and T.-Y. Tseng, “Overview of emerging nonvolatile memory technologies,” Nanoscale Res Lett, vol. 9, no. 1, p. 526, 2014.
[23]
N. Cadenelli, J. Polo, and D. Carrera, “Accelerating K-mer Frequency Counting with GPU and Non-Volatile Memory,” in 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Bangkok, Dec. 2017, pp. 434–441.
[24]
V. Gramoli, “More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms,” in Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francisco CA USA, Jan. 2015, pp. 1–10.
[25]
P. Melsted and J. K. Pritchard, “Efficient counting of k-mers in DNA sequences using a bloom filter,” BMC Bioinformatics, vol. 12, no. 1, p. 333, Dec. 2011.

Cited By

View all

Index Terms

  1. Frigate: a fast, in-memory tool for counting and querying k-mers
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image ACM Other conferences
              ICBBT '21: Proceedings of the 2021 13th International Conference on Bioinformatics and Biomedical Technology
              May 2021
              293 pages
              ISBN:9781450389655
              DOI:10.1145/3473258
              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              Published: 11 December 2021

              Permissions

              Request permissions for this article.

              Check for updates

              Author Tags

              1. Genome analysis
              2. K-mer counting
              3. Parallel computing
              4. Performance engineering

              Qualifiers

              • Research-article
              • Research
              • Refereed limited

              Conference

              ICBBT '21

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • 0
                Total Citations
              • 75
                Total Downloads
              • Downloads (Last 12 months)12
              • Downloads (Last 6 weeks)1
              Reflects downloads up to 19 Feb 2025

              Other Metrics

              Citations

              Cited By

              View all

              View Options

              Login options

              View options

              PDF

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              HTML Format

              View this article in HTML Format.

              HTML Format

              Figures

              Tables

              Media

              Share

              Share

              Share this Publication link

              Share on social media