loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Eleonora Mian ; Enrico Petrucci ; Cinzia Pizzi and Matteo Comin

Affiliation: Department of Information Engineering, University of Padova, Padova, 35131, Italy

Keyword(s): k-Mers, Gapped q-Gram, Multiple Spaced Seeds, Efficient Hashing.

Abstract: Alignment-Free analysis of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Hashing k-mers is a common function across many alignment-free applications and it is widely used for indexing, querying and rapid similarity search. Recently, spaced seeds, a special type of pattern that accounts for errors or mutations, are routinely used instead of k-mers. Spaced seeds allow to improve the sensitivity, with respect to k-mers, in many applications, however the hashing of spaced seeds increases substantially the computational time. Moreover, if multiple spaced seeds are used the accuracy can further increases at the cost of running time. In this paper we address the problem of efficient multiple spaced seed hashing. The proposed algorithms exploit the similarity of adjacent spaced seed hash values in an input sequence in order to efficiently compute the next hashes. We report the results on several tests which show that our methods signifi cantly outperform the previously proposed algorithms, with a speedup that can reach 20x. We also apply these efficient spaced seeds hashing algorithms to an application in the field of metagenomic, the classification of reads performed by Clark-S (Ounit and Lonardi, 2016), and we shown that a significant speedup can be obtained, thus resolving the slowdown introduced by the use of multiple spaced seeds. Code available at: https://github.com/CominLab/MISSH. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.220.160.216

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Mian, E.; Petrucci, E.; Pizzi, C. and Comin, M. (2023). Efficient Hashing of Multiple Spaced Seeds with Application. In Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - BIOINFORMATICS; ISBN 978-989-758-631-6; ISSN 2184-4305, SciTePress, pages 155-162. DOI: 10.5220/0011632900003414

@conference{bioinformatics23,
author={Eleonora Mian. and Enrico Petrucci. and Cinzia Pizzi. and Matteo Comin.},
title={Efficient Hashing of Multiple Spaced Seeds with Application},
booktitle={Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - BIOINFORMATICS},
year={2023},
pages={155-162},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011632900003414},
isbn={978-989-758-631-6},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) - BIOINFORMATICS
TI - Efficient Hashing of Multiple Spaced Seeds with Application
SN - 978-989-758-631-6
IS - 2184-4305
AU - Mian, E.
AU - Petrucci, E.
AU - Pizzi, C.
AU - Comin, M.
PY - 2023
SP - 155
EP - 162
DO - 10.5220/0011632900003414
PB - SciTePress