Abstract
An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their computation also provides a fast alternative for measuring approximation in sequence comparison. There exists an \(\mathcal {O}(n)\)-time and \(\mathcal {O}(n)\)-space algorithm for computing all minimal absent words on a fixed-sized alphabet based on the construction of suffix array (Barton et al., 2014). An implementation of this algorithm was also provided by the authors and is currently the fastest available. In this article, we present a new \(\mathcal {O}(n)\)-time and \(\mathcal {O}(n)\)-space algorithm for computing all minimal absent words; it has the desirable property that, given the indexing data structure at hand, the computation of minimal absent words can be executed in parallel. Experimental results show that a multiprocessing implementation of this algorithm can accelerate the overall computation by more than a factor of two compared to state-of-the-art approaches. By excluding the indexing data structure construction time, we show that the implementation achieves near-optimal speed-ups.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abboud, A., Williams, V.V., Weimann, O.: Consequences of faster alignment of sequences. In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP 2014. LNCS, vol. 8572, pp. 39–51. Springer, Heidelberg (2014)
Acquisti, C., Poste, G., Curtiss, D., Kumar, S.: Nullomers: really a matter of natural selection? PLoS One 2(10), e1022 (2007)
Barton, C., Heliou, A., Mouchard, L., Pissis, S.P.: Linear-time computation of minimal absent words using suffix array. BMC Bioinform. 15, 388 (2014)
Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional Burrows-Wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013)
Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theoret. Comput. Sci. 450, 109–116 (2012)
Crochemore, M., Mignosi, F., Restivo, A.: Automata and forbidden words. Inf. Process. Lett. 67, 111–117 (1998)
Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011)
Garcia, S.P., Pinho, A.J.: Minimal absent words in four human genome assemblies. PLoS One 6(12), e29344 (2011)
Garcia, S.P., Pinho, O.J., Rodrigues, J., Bastos, C.A.C., Ferreira, G.P.J.S.: Minimal absent words in prokaryotic and eukaryotic genomes. PLoS One 6, e16065 (2011)
Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinform. 6, 123 (2005)
Jacobson, G.: Space-efficient static trees and graphs. In: 30th SFCS 1989, pp. 549–554. IEEE Computer Society (1989)
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Mignosi, F., Restivo, A., Sciortino, M.: Words and forbidden factors. Theoret. Comput. Sci. 273(1–2), 99–117 (2002)
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: DCC 2009, pp. 193–202. IEEE Computer Society (2009)
Pinho, A.J., Ferreira, P.J.S.G., Garcia, S.P., Rodrigues, J.M.: On finding minimal absent words. BMC Bioinformatics 10 (2009)
Shun, J.: Fast parallel computation of longest common prefixes. In: SC 2014, pp. 387–398. IEEE Computer Society (2014)
Silva, R.M., Pratas, D., Castro, L., Pinho, A.J., Ferreira, P.J.S.G.: Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics 31(15), 2421–2425 (2015)
Wu, Z.D., Jiang, T., Su, W.J.: Efficient computation of shortest absent words in a genomic sequence. Inf. Process. Lett. 110(14–15), 596–601 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Barton, C., Heliou, A., Mouchard, L., Pissis, S.P. (2016). Parallelising the Computation of Minimal Absent Words. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-32152-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32151-6
Online ISBN: 978-3-319-32152-3
eBook Packages: Computer ScienceComputer Science (R0)