Skip to main content

Towards Faster Matching Algorithm Using Ternary Tree in the Area of Genome Mapping

  • Conference paper
  • First Online:
Advances in Intelligent Networking and Collaborative Systems (INCoS 2020)

Abstract

In area of precision medicine there is a need to map long sequences of DNA, which are represented as strings of characters or numbers. Most of the computer programs used for genome mapping use suffix-based data structures, but those are much more suitable for mapping of short DNA sequences represented as strings over small alphabets. The most crucial parameters of data structure used for DNA mapping are time to fill the data structure, search time and system resources needed, especially memory, as the amount of data from scanning process can be really large. This article will describe implementation of memory optimized Ternary Search Tree (TST) for indexing of positions of labels obtained by Bionano Genomics DNA imaging device. BNX file parser with alphabet encoding functions is described and performance results from experiments with presented software solution on real data from Bionano Genomics Saphyr device are also included.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ashley, E.A.: Towards precision medicine. Nat. Rev. Genet. 17(9), 507–522 (2016)

    Article  Google Scholar 

  2. Shelton, J.M., et al.: Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool. BMC Genom. 16, 734 (2015)

    Article  Google Scholar 

  3. Chan, S., et al.: Structural variation detection and analysis using Bionano optical mapping, pp. 193–203. Springer (2018)

    Google Scholar 

  4. Edwards, D., Stajich, J., Hansen, D.: Bioinformatics: Tools and Applications. Springer, New York (2009)

    Book  Google Scholar 

  5. Michael, S.: Rosenberg: Sequence Alignment Methods, Models, Concepts, and Strategies. University of California Press, Berkeley (2009)

    Google Scholar 

  6. Clement, J., Flajolet, P., Vallee, B.: The analysis of hybrid trie structures. Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 531–539 (1998)

    Google Scholar 

  7. Sedgewick, R., Wayne, K.: Algorithms. Addison-Wesley Professional, Upper Saddle River (2011)

    Google Scholar 

  8. Blumer, A., Ehrenfeucht, A., Haussler, D.: Average sizes of suffix trees and dawgs. Discrete Appl. Math. 24, 37–45 (1989)

    Article  MathSciNet  Google Scholar 

  9. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings 38th Annual Symposium on Foundations of Computer Science, pp. 137–143 (1997)

    Google Scholar 

  10. Na, J.Ch. et al.: Suffix tree of alignment: an efficient index for similar data. In: International Workshop on Combinatorial Algorithms, pp. 337–348 (2013)

    Google Scholar 

  11. Manber, U., Myers, G.S.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  Google Scholar 

  12. Ferragina, P., Manzini, G.: Opportunistic data structures with applications, pp. 390–398. IEEE (2000)

    Google Scholar 

  13. Blumer, A., et al.: The smallest automation recognizing the subwords of a text. Theor. Comput. Sci. 40, 31–55 (1985)

    Article  MathSciNet  Google Scholar 

  14. Ehrenfeucht, A. and McConnell, R. M.: String searching. In: Handbook of Data Structures and Applications, pp. 477–494. Chapman and Hall (2018)

    Google Scholar 

  15. Robenek, D., Platos, J., Snasel, V.: Ternary Tree Optimalization for n-gram Indexing. In: DATESO, pp. 47–58 (2014)

    Google Scholar 

  16. Bionano Genomics: BNX File Format Specification Sheet (2018). https://bionanogenomics.com/wp-content/uploads/2018/04/30038-BNX-File-Format-Specification-Sheet.pdf. Accessed 21 May 2020

  17. Pang, A.W.C., et al.: Efficient structural variation detection and annotation using bionano genome mapping. Bionano Genomics 131, 1345–1362 (2018)

    Google Scholar 

  18. Bionano Genomics: Saphyr (2018). https://bionanogenomics.com/products/saphyr/. Accessed 29 May 2020

Download references

Acknowledgements

This work is supported by SGS project, VSB-Technical University of Ostrava, under the grant no. SP2020/161 and Celgene Research Grant-CZ-102.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rostislav Hřivňák .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hřivňák, R., Gajdoš, P., Snášel, V. (2021). Towards Faster Matching Algorithm Using Ternary Tree in the Area of Genome Mapping. In: Barolli, L., Li, K., Miwa, H. (eds) Advances in Intelligent Networking and Collaborative Systems. INCoS 2020. Advances in Intelligent Systems and Computing, vol 1263. Springer, Cham. https://doi.org/10.1007/978-3-030-57796-4_40

Download citation

Publish with us

Policies and ethics