Abstract.
A factor u of a word w is (right) univalent if there exists a unique letter a such that ua is still a factor of w. A univalent factor is minimal if none of its proper suffixes is univalent. The starting block of w is the shortest prefix \(\overline{h}_w\) of w such that all proper prefixes of w of length \(\geq |\overline{h}_w|\) are univalent. We study univalent factors of a word and their relationship with the well known notions of boxes, superboxes, and minimal forbidden factors. Moreover, we prove some new uniqueness conditions for words based on univalent factors. In particular, we show that a word is uniquely determined by its starting block, the set of the extensions of its minimal univalent factors, and its length or its terminal box. Finally, we show how the results and techniques presented can be used to solve the problem of sequence assembly for DNA molecules, under reasonable assumptions on the repetitive structure of the considered molecule and on the set of known fragments.
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received: 4 November 2000 / 23 November 2001
Rights and permissions
About this article
Cite this article
Carpi, A., de Luca, A. & Varricchio, S. Words, univalent factors, and boxes. Acta Informatica 38, 409–436 (2002). https://doi.org/10.1007/s002360100079
Issue Date:
DOI: https://doi.org/10.1007/s002360100079