Skip to main content

Compact Data Structures for Shortest Unique Substring Queries

  • Conference paper
  • First Online:
Book cover String Processing and Information Retrieval (SPIRE 2019)

Abstract

Given a string T of length n, a substring \(u = T[i..j]\) of T is called a shortest unique substring (SUS) for an interval [st] if (a) u occurs exactly once in T, (b) u contains the interval [st] (i.e. \(i \le s \le \)\( t \le j\)), and (c) every substring v of T with \(|v| < |u|\) containing [st] occurs at least twice in T. Given a query interval \([s, t] \subset [1, n]\), the interval SUS problem is to output all the SUSs for the interval [st]. In this article, we propose a \(4n + o(n)\) bits data structure answering an interval SUS query in output-sensitive \(O(occ)\) time, where \(occ\) is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for \(s = t\). Here, we propose a \(\lceil (\log _2{3} + 1)n \rceil + o(n)\) bits data structure answering a point SUS query in the same output-sensitive time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We show later in Lemma 1 that the number of minimal unique substrings m is at most n.

  2. 2.

    \(\mathsf {Succ}_Y(d)\) can be computed similarly by considering the case whether \(BIT_Y[d] =\mathtt {1}\).

  3. 3.

    Although there can be multiple SUSs containing i, their lengths are all equal.

  4. 4.

    The actual reporting of those SUSs is done in Lemma 14.

References

  1. Clark, D.R.: Compact Pat Trees. Ph.D. thesis (1998), uMI Order No. GAXNQ-21335

    Google Scholar 

  2. Davoodi, P., Raman, R., Satti, S.R.: Succinct representations of binary trees for range minimum queries. In: Proceedings of the 18th Annual International Computing and Combinatorics Conference (COCOON 2012), pp. 396–407 (2012)

    Google Scholar 

  3. Ganguly, A., Hon, W.K., Shah, R., Thankachan, S.V.: Space-time trade-offs for finding shortest unique substrings and maximal unique matches. Theor. Comput. Sci. 700, 75–88 (2017)

    Article  MathSciNet  Google Scholar 

  4. Haubold, B., Pierstorff, N., Möller, F., Wiehe, T.: Genome comparison without alignment using shortest unique substrings. BMC Bioinform. 6(1), 123 (2005)

    Article  Google Scholar 

  5. Hon, W.K., Thankachan, S.V., Xu, B.: An in-place framework for exact and approximate shortest unique substring queries. In: Proceedings of International Symposium on Algorithms and Computation (ISAAC), pp. 755–767 (2015)

    Google Scholar 

  6. Hu, X., Pei, J., Tao, Y.: Shortest unique queries on strings. In: Proceedings of String Processing and Information Retrieval (SPIRE), pp. 161–172 (2014)

    Google Scholar 

  7. İleri, A.M., Külekci, M.O., Xu, B.: A simple yet time-optimal and linear-space algorithm for shortest unique substring queries. Theor. Comput. Sci. 562, 621–633 (2015)

    Article  MathSciNet  Google Scholar 

  8. Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of 30th Annual Symposium on Foundations of Computer Science (FOCS), pp. 549–554 (1989)

    Google Scholar 

  9. Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12(1), 40–66 (2005)

    MathSciNet  MATH  Google Scholar 

  10. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  Google Scholar 

  11. Mieno, T., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substring queries on run-length encoded strings. In: Proceedings of 41st International Symposium on Mathematical Foundations of Computer Science (MFCS), pp. 69:1–69:11 (2016)

    Google Scholar 

  12. Pei, J., Wu, W.C., Yeh, M.: On shortest unique substring queries. In: Proceedings of IEEE 29th International Conference on Data Engineering (ICDE), pp. 937–948 (2013)

    Google Scholar 

  13. Tsuruta, K., Inenaga, S., Bannai, H., Takeda, M.: Shortest unique substrings queries in optimal time. In: Proceedings of SOFSEM 2014: Theory and Practice of Computer Science, pp. 503–513 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuya Mieno .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mieno, T., Köppl, D., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M. (2019). Compact Data Structures for Shortest Unique Substring Queries. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32686-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32685-2

  • Online ISBN: 978-3-030-32686-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics