Skip to main content

Succinct Text Indexes on Large Alphabet

  • Conference paper
Book cover Theory and Applications of Models of Computation (TAMC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3959))

Abstract

In this paper, we first consider some properties of strings who have the same suffix array. Next, we design a data structure to support rank and select operations on an alphabet Σ using nlog|Σ| + (n log|Σ|) bits in O(log|Σ|) time for a text of length n. It also supports an extended rank, namely rank  ≤ , such that rank \(^{\rm \leq}_{\alpha}\)(T,i) returns the number of letters which are smaller than α in string T, plus the number of αs up to position i. Also, it runs in O(log|Σ|) time. By this structure, we implement the DAWG succinctly. The main structure only takes nlog|Σ| + o(nlog|Σ|) bits and supports basic operations of DAWG efficiently.

Supported by NSF of China No.60473099 and Foundation of Young Scientist of Jilin Province No.20040119.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abouelhoda, M.I., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 31–43. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Bannai, H., Inenaga, S., Shinohara, A., Takeda, M.: Inferring strings from graphs and arrays. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 208–217. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  • Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.: The smallest automation recognizing the subwords of a text. Theoretical Computer Science 40, 31–55 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  • Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. DEC SRC Research Report 124 (1994)

    Google Scholar 

  • Crochemore, M., Hancart, C.: Automata for matching patterns. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages. Linear Modeling: Background and Application, vol. 2(9), pp. 399–462. Springer, Heidelberg (1997)

    Google Scholar 

  • Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 4lst Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 390–398 (2000)

    Google Scholar 

  • Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: PAT trees and PAT arrays. In: Frakes, W., Baeza-Yates, R.A. (eds.) Information Retrieval: Algorithms and Data Structures, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)

    Google Scholar 

  • Gusfield, D.: Algorithms on Strings Trees and Sequences. Cambridge University-Press, New York (1997)

    Book  MATH  Google Scholar 

  • Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of the 32nd ACM Symposium on Theory of Computing, STOC (2000)

    Google Scholar 

  • He, M., Ian Munro, J., Srinivasa Rao, S.: A categorization theorem on suffix arrays with applications to space efficient text indexes. In: SIAM Symposium on Discrete Algorithms (SODA), pp. 23–32 (2005)

    Google Scholar 

  • Jacobson, G.: Succinct static data structures. Technical Report CMU-CS-89-112, Dept. of Computer Science, Carnegie-Mellon University (January 1989)

    Google Scholar 

  • Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Munro, J.I., Raman, V.: Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs. In: Proc. 38th Annual IEEE Symp. on Foundations of Computer Science, October 1997, pp. 118–126 (1997)

    Google Scholar 

  • Munro, J.I.: Tables. In: Proceedings of the 16th ray Conference on Foundations of Software Technology and Computer Science (FSTTCS 1996). LNCS, vol. 1180, pp. 37–42 (1996)

    Google Scholar 

  • Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix arrays. In: Proc. 11th International Symposium on Algorithms and Computation. LNCS, vol. 1969, pp. 410–421. Springer, Heidelberg (2000)

    Google Scholar 

  • Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  • Zhang, M.: Succinct Text Indexes on Large Alphabet. Technical Report, Jilin University (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, M., Tang, J., Guo, D., Hu, L., Li, Q. (2006). Succinct Text Indexes on Large Alphabet. In: Cai, JY., Cooper, S.B., Li, A. (eds) Theory and Applications of Models of Computation. TAMC 2006. Lecture Notes in Computer Science, vol 3959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11750321_50

Download citation

  • DOI: https://doi.org/10.1007/11750321_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34021-8

  • Online ISBN: 978-3-540-34022-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics