Abstract
In this paper, we first consider some properties of strings who have the same suffix array. Next, we design a data structure to support rank and select operations on an alphabet Σ using nlog|Σ| + (n log|Σ|) bits in O(log|Σ|) time for a text of length n. It also supports an extended rank, namely rank ≤ , such that rank \(^{\rm \leq}_{\alpha}\)(T,i) returns the number of letters which are smaller than α in string T, plus the number of αs up to position i. Also, it runs in O(log|Σ|) time. By this structure, we implement the DAWG succinctly. The main structure only takes nlog|Σ| + o(nlog|Σ|) bits and supports basic operations of DAWG efficiently.
Supported by NSF of China No.60473099 and Foundation of Young Scientist of Jilin Province No.20040119.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abouelhoda, M.I., Ohlebusch, E., Kurtz, S.: Optimal exact string matching based on suffix arrays. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 31–43. Springer, Heidelberg (2002)
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M.: Inferring strings from graphs and arrays. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 208–217. Springer, Heidelberg (2003)
Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T., Seiferas, J.: The smallest automation recognizing the subwords of a text. Theoretical Computer Science 40, 31–55 (1985)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. DEC SRC Research Report 124 (1994)
Crochemore, M., Hancart, C.: Automata for matching patterns. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages. Linear Modeling: Background and Application, vol. 2(9), pp. 399–462. Springer, Heidelberg (1997)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 4lst Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 390–398 (2000)
Gonnet, G., Baeza-Yates, R., Snider, T.: New indices for text: PAT trees and PAT arrays. In: Frakes, W., Baeza-Yates, R.A. (eds.) Information Retrieval: Algorithms and Data Structures, pp. 66–82. Prentice-Hall, Englewood Cliffs (1992)
Gusfield, D.: Algorithms on Strings Trees and Sequences. Cambridge University-Press, New York (1997)
Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceedings of the 32nd ACM Symposium on Theory of Computing, STOC (2000)
He, M., Ian Munro, J., Srinivasa Rao, S.: A categorization theorem on suffix arrays with applications to space efficient text indexes. In: SIAM Symposium on Discrete Algorithms (SODA), pp. 23–32 (2005)
Jacobson, G.: Succinct static data structures. Technical Report CMU-CS-89-112, Dept. of Computer Science, Carnegie-Mellon University (January 1989)
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 935–948 (1993)
Munro, J.I., Raman, V.: Succinct Representation of Balanced Parentheses, Static Trees and Planar Graphs. In: Proc. 38th Annual IEEE Symp. on Foundations of Computer Science, October 1997, pp. 118–126 (1997)
Munro, J.I.: Tables. In: Proceedings of the 16th ray Conference on Foundations of Software Technology and Computer Science (FSTTCS 1996). LNCS, vol. 1180, pp. 37–42 (1996)
Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix arrays. In: Proc. 11th International Symposium on Algorithms and Computation. LNCS, vol. 1969, pp. 410–421. Springer, Heidelberg (2000)
Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Zhang, M.: Succinct Text Indexes on Large Alphabet. Technical Report, Jilin University (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, M., Tang, J., Guo, D., Hu, L., Li, Q. (2006). Succinct Text Indexes on Large Alphabet. In: Cai, JY., Cooper, S.B., Li, A. (eds) Theory and Applications of Models of Computation. TAMC 2006. Lecture Notes in Computer Science, vol 3959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11750321_50
Download citation
DOI: https://doi.org/10.1007/11750321_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34021-8
Online ISBN: 978-3-540-34022-5
eBook Packages: Computer ScienceComputer Science (R0)