Abstract
In area of precision medicine there is a need to map long sequences of DNA, which are represented as strings of characters or numbers. Most of the computer programs used for genome mapping use suffix-based data structures, but those are much more suitable for mapping of short DNA sequences represented as strings over small alphabets. The most crucial parameters of data structure used for DNA mapping are time to fill the data structure, search time and system resources needed, especially memory, as the amount of data from scanning process can be really large. This article will describe implementation of memory optimized Ternary Search Tree (TST) for indexing of positions of labels obtained by Bionano Genomics DNA imaging device. BNX file parser with alphabet encoding functions is described and performance results from experiments with presented software solution on real data from Bionano Genomics Saphyr device are also included.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ashley, E.A.: Towards precision medicine. Nat. Rev. Genet. 17(9), 507–522 (2016)
Shelton, J.M., et al.: Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool. BMC Genom. 16, 734 (2015)
Chan, S., et al.: Structural variation detection and analysis using Bionano optical mapping, pp. 193–203. Springer (2018)
Edwards, D., Stajich, J., Hansen, D.: Bioinformatics: Tools and Applications. Springer, New York (2009)
Michael, S.: Rosenberg: Sequence Alignment Methods, Models, Concepts, and Strategies. University of California Press, Berkeley (2009)
Clement, J., Flajolet, P., Vallee, B.: The analysis of hybrid trie structures. Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 531–539 (1998)
Sedgewick, R., Wayne, K.: Algorithms. Addison-Wesley Professional, Upper Saddle River (2011)
Blumer, A., Ehrenfeucht, A., Haussler, D.: Average sizes of suffix trees and dawgs. Discrete Appl. Math. 24, 37–45 (1989)
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings 38th Annual Symposium on Foundations of Computer Science, pp. 137–143 (1997)
Na, J.Ch. et al.: Suffix tree of alignment: an efficient index for similar data. In: International Workshop on Combinatorial Algorithms, pp. 337–348 (2013)
Manber, U., Myers, G.S.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications, pp. 390–398. IEEE (2000)
Blumer, A., et al.: The smallest automation recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)
Ehrenfeucht, A. and McConnell, R. M.: String searching. In: Handbook of Data Structures and Applications, pp. 477–494. Chapman and Hall (2018)
Robenek, D., Platos, J., Snasel, V.: Ternary Tree Optimalization for n-gram Indexing. In: DATESO, pp. 47–58 (2014)
Bionano Genomics: BNX File Format Specification Sheet (2018). https://bionanogenomics.com/wp-content/uploads/2018/04/30038-BNX-File-Format-Specification-Sheet.pdf. Accessed 21 May 2020
Pang, A.W.C., et al.: Efficient structural variation detection and annotation using bionano genome mapping. Bionano Genomics 131, 1345–1362 (2018)
Bionano Genomics: Saphyr (2018). https://bionanogenomics.com/products/saphyr/. Accessed 29 May 2020
Acknowledgements
This work is supported by SGS project, VSB-Technical University of Ostrava, under the grant no. SP2020/161 and Celgene Research Grant-CZ-102.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hřivňák, R., Gajdoš, P., Snášel, V. (2021). Towards Faster Matching Algorithm Using Ternary Tree in the Area of Genome Mapping. In: Barolli, L., Li, K., Miwa, H. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2020. Advances in Intelligent Systems and Computing, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-57796-4_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-57796-4_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57795-7
Online ISBN: 978-3-030-57796-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)