Abstract
Mapping reads to a reference genome is the first and very important step in genome analysis. One of the most used data structure for DNA read mapping is the FM-index. To this day many different variants of the FM-index exist. In the proposed paper we introduce two new variants of the FM-index suitable especially for DNA sequences. Proposed variants decrease the number of cache misses by efficiently interleaving auxiliary data structures of the FM-index thus increasing the searching speed. Experimental results have shown that the proposed variants are about two times faster than other variants, while having comparatively low memory requirements.Source code is available at https://github.com/xsitarcik/DNASeqMap.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adjeroh, D., et al.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer, Boston (2008)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990). http://www.sciencedirect.com/science/article/pii/S0022283605803602
Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. ACM Trans. Algorithms 10(4), 23:1–23:19 (2014)
Berger, B., Peng, J., Singh, M.: Computational solutions for omics data. Nat. Rev. Genet. 14(5), 333–346 (2013)
Burrows, M., Wheeler, D.: A Block-sorting Lossless Data Compression Algorithm. No. 124, Digital, Systems Research Center (1994)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 390–398, November 2000
Ferragina, P., Giancarlo, R., Manzini, G.: The myriad virtues of Wavelet Trees. Inf. Comput. 207(8), 849–866 (2009)
Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: from theory to practice! CoRR abs/0712.3360 (2007)
Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005). http://doi.acm.org/10.1145/1082036.1082039
Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play with succinct data structures. In: 13th International Symposium on Experimental Algorithms (SEA 2014), pp. 326–337 (2014)
Gog, S., Petri, M.: Optimized succinct data structures for massive data. Softw. Pract. Exper. 44(11), 1287–1314 (2014)
González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA 2005), Greece, pp. 27–38 (2005)
Grabowski, S., et al.: FM-index for dummies. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. Towards Efficient Solutions for Data Analysis and Knowledge Representation, pp. 189–201. Springer, Cham (2017)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 841–850. Society for Industrial and Applied Mathematics, Philadelphia (2003)
Grossi, R., Vitter, J., Xu, B.: Wavelet trees: from theory to practice. In: Proceedings of 1st International Conference on Data Compression, Communication, and Processing, CCP 2011, pp. 210–221, July 2011
Kärkkäinen, J., Puglisi, S.J.: Fixed block compression boosting in FM-indexes. CoRR (2011). http://arxiv.org/abs/1104.3810
Mäkinen, V., Navarro, G.: Run-length FM-index. In: Proceedings of DIMACS Workshop: “The Burrows-Wheeler Transform: Ten Years Later”, pp. 17–19 (2004)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Proceedings of the 16th Annual Conference on Combinatorial Pattern Matching, CPM 2005, pp. 45–56. Springer, Heidelberg (2005)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Navarro, G.: Wavelet trees for all. J. Discrete Algorithms 25, 2–20 (2014). 23rd Annual Symposium on Combinatorial Pattern Matching
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4) (2007). http://doi.acm.org/10.1145/1290672.1290680
Vigna, S.: Broadword implementation of rank/select queries. In: Proceedings of the 7th International Conference on Experimental Algorithms, WEA 2008, pp. 154–168. Springer, Heidelberg (2008)
Acknowledgement
This work was partially supported by the Scientific Grant Agency of The Slovak Republic, Grant No. VG 1/0458/18, APVV-16-0484 and STU Grant scheme for Support of Young Researchers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sitarčík, J., Lucká, M. (2020). Cache-Efficient FM-Index Variants for Mapping of DNA Sequences. In: Fdez-Riverola, F., Rocha, M., Mohamad, M., Zaki, N., Castellanos-Garzón, J. (eds) Practical Applications of Computational Biology and Bioinformatics, 13th International Conference. PACBB 2019. Advances in Intelligent Systems and Computing, vol 1005 . Springer, Cham. https://doi.org/10.1007/978-3-030-23873-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-23873-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23872-8
Online ISBN: 978-3-030-23873-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)