Skip to main content

ND-GiST: A Novel Method for Disk-Resident k-mer Indexing

  • Conference paper
  • First Online:
New Knowledge in Information Systems and Technologies (WorldCIST'19 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 931))

Included in the following conference series:

Abstract

Several challenges are related to metagenomics, one of which is the data management. A related central concept is k-mer which means a possible subsequence of length k from a DNA (sub)sequence. In this work, the focus is on indexing k-mers and supporting box queries where a query string of length k might have multiple allowed nucleobases per position. A novel index structure: ND-GiST is introduced which has capability to handle box queries. Comparing it with full table scan and the traditional B-tree, the performance results of ND-GiST are encouraging.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See: https://www.postgresql.org/docs/10/gist.html.

  2. 2.

    The records are listed in the order of insertion into the tree.

  3. 3.

    https://www.bioinformatics.org/sms/iupac.html.

  4. 4.

    See: https://www.postgresql.org/docs/10/indexes-types.html.

References

  1. Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indexes. Acta Inform. 1(3), 173–189 (1972). https://doi.org/10.1007/978-3-642-59412-0_15

    Article  MATH  Google Scholar 

  2. Chen, C., Watve, A., Pramanik, S., Zhu, Q.: The bond-tree: an efficient indexing method for box queries in nonordered discrete data spaces. IEEE Trans. Knowl. Data Eng. 25(11), 2629–2643 (2013). https://doi.org/10.1109/TKDE.2012.132

    Article  Google Scholar 

  3. Dorok, S., Breß, S., Teubner, J., Läpple, H., Saake, G., Markl, V.: Efficiently storing and analyzing genome data in database systems. Datenbank-Spektrum 17(2), 139–154 (2017). https://doi.org/10.1007/s13222-017-0254-9

    Article  Google Scholar 

  4. Guttman, A.: R-trees: a dynamic index structure for spatial searching. SIGMOD Rec. 14(2) (1984). https://doi.org/10.1145/602259.602266

  5. Janetzki, S., Tiedemann, M.R., Balar, H.: Genome data management using RDBMSs. Technical report, Otto-von-Guericke Universität, Magdeburg, Germany (2015). https://doi.org/10.13140/RG.2.1.4047.6006

  6. Oulas, A., Pavloudi, C., Polymenakou, P., Pavlopoulos, G.A., Papanikolaou, N., Kotoulas, G., Arvanitidis, C., Iliopoulos, I.: Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform. Biol. Insights 9, 75–88 (2015). https://doi.org/10.4137/BBI.S12462

    Article  Google Scholar 

  7. Qian, G., Zhu, Q., Xue, Q., Pramanik, S.: The ND-tree: a dynamic indexing technique for multidimensional non-ordered discrete data spaces. In: Proceedings 2003 VLDB Conference, pp. 620–631. Elsevier (2003). https://doi.org/10.1016/B978-012722442-8/50061-6

    Chapter  Google Scholar 

  8. Scholz, M.B., Lo, C.C., Chain, P.S.: Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Curr. Opin. Biotechnol. 23(1), 9–15 (2012). https://doi.org/10.1016/j.copbio.2011.11.013

    Article  Google Scholar 

  9. Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014). https://doi.org/10.1186/gb-2014-15-3-r46

    Article  Google Scholar 

Download references

Acknowledgments

The project has been supported by the European Union’s Horizon 2020 research and innovation program under grant agreement no. 643476 (COMPARE), by the Novo Nordisk Foundation Interdisciplinary Synergy Programme [Grant NNF15OC0016584] and by the European Union, co-financed by the European Social Fund (EFOP-3.6.3-VEKOP-16-2017-00002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to János Márk Szalai-Gindl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Szalai-Gindl, J.M., Kiss, A., Halász, G., Dobos, L., Csabai, I. (2019). ND-GiST: A Novel Method for Disk-Resident k-mer Indexing. In: Rocha, Á., Adeli, H., Reis, L., Costanzo, S. (eds) New Knowledge in Information Systems and Technologies. WorldCIST'19 2019. Advances in Intelligent Systems and Computing, vol 931. Springer, Cham. https://doi.org/10.1007/978-3-030-16184-2_63

Download citation

Publish with us

Policies and ethics