Abstract
Given a string T of length n, a substring \(u = T[i..j]\) of T is called a shortest unique substring (SUS) for an interval [s, t] if (a) u occurs exactly once in T, (b) u contains the interval [s, t] (i.e. \(i \le s \le \)\( t \le j\)), and (c) every substring v of T with \(|v| < |u|\) containing [s, t] occurs at least twice in T. Given a query interval \([s, t] \subset [1, n]\), the interval SUS problem is to output all the SUSs for the interval [s, t]. In this article, we propose a \(4n + o(n)\) bits data structure answering an interval SUS query in output-sensitive \(O(occ)\) time, where \(occ\) is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for \(s = t\). Here, we propose a \(\lceil (\log _2{3} + 1)n \rceil + o(n)\) bits data structure answering a point SUS query in the same output-sensitive time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We show later in Lemma 1 that the number of minimal unique substrings m is at most n.
- 2.
\(\mathsf {Succ}_Y(d)\) can be computed similarly by considering the case whether \(BIT_Y[d] =\mathtt {1}\).
- 3.
Although there can be multiple SUSs containing i, their lengths are all equal.
- 4.
The actual reporting of those SUSs is done in Lemma 14.
References
Clark, D.R.: Compact Pat Trees. Ph.D. thesis (1998), uMI Order No. GAXNQ-21335
Davoodi, P., Raman, R., Satti, S.R.: Succinct representations of binary trees for range minimum queries. In: Proceedings of the 18th Annual International Computing and Combinatorics Conference (COCOON 2012), pp. 396–407 (2012)
Ganguly, A., Hon, W.K., Shah, R., Thankachan, S.V.: Space-time trade-offs for finding shortest unique substrings and maximal unique matches. Theor. Comput. Sci. 700, 75–88 (2017)
Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinform. 6(1), 123 (2005)
Hon, W.K., Thankachan, S.V., Xu, B.: An in-place framework for exact and approximate shortest unique substring queries. In: Proceedings of International Symposium on Algorithms and Computation (ISAAC), pp. 755–767 (2015)
Hu, X., Pei, J., Tao, Y.: Shortest unique queries on strings. In: Proceedings of String Processing and Information Retrieval (SPIRE), pp. 161–172 (2014)
İleri, A.M., Külekci, M.O., Xu, B.: A simple yet time-optimal and linear-space algorithm for shortest unique substring queries. Theor. Comput. Sci. 562, 621–633 (2015)
Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of 30th Annual Symposium on Foundations of Computer Science (FOCS), pp. 549–554 (1989)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12(1), 40–66 (2005)
Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Mieno, T., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substring queries on run-length encoded strings. In: Proceedings of 41st International Symposium on Mathematical Foundations of Computer Science (MFCS), pp. 69:1–69:11 (2016)
Pei, J., Wu, W.C., Yeh, M.: On shortest unique substring queries. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), pp. 937–948 (2013)
Tsuruta, K., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substrings queries in optimal time. In: Proceedings of SOFSEM 2014: Theory and Practice of Computer Science, pp. 503–513 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mieno, T., Köppl, D., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M. (2019). Compact Data Structures for Shortest Unique Substring Queries. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-32686-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32685-2
Online ISBN: 978-3-030-32686-9
eBook Packages: Computer ScienceComputer Science (R0)