Abstract
Finding tuples in a database that match a particular subsequence (with gaps) is an important problem for a range of applications. Subsequence search is equivalent to searching for regular expressions of the type .* q 1 .* q 2 .* … .* q l .*, where the subsequence is q 1 q 2 …q l . For efficient execution of these queries, there is a need for appropriate index structures that are both efficient and can scale to large problem sizes. This paper presents two index structures for such queries based on trie and bitmap. These indices are disk-resident, hence can be easily used by large databases with limited memory availability. Our indices are applicable to dynamic databases, where tuples can be added or deleted. Both indices are implemented and validated against a naive approach. The results show that the proposed indices are efficient, having low I/O and time overhead.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Subramaniam, L.V., Faruquie, T.A., Ikbal, S., Godbole, S., Mohania, M.K.: Business intelligence from voice of customer. In: International Conference on Data Engineering (2009)
Knuth, D.E., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6, 323–350 (1977)
Antoshenkov, G.: Byte-aligned bitmap compression. In: Conference on Data Compression (1995)
Baeza-Yates, R.A., Gonnet, G.H.: Fast text searching for regular expressions or automaton searching on tries. J. ACM 43(6), 915–936 (1996)
Manber, U., Baeza-Yates, R.: An algorithm for string matching with a sequence of don’t cares. Information Processing Letters 37(3), 133–136 (1991)
Yong Chan, C., Garofalakis, M., Rastogi, R.: Re-tree: An efficient index structure for regular expressions. The Very Large Databases Journal 12(2), 102–119 (2003)
de la Briandais, R.: File searching using variable length keys. In: AFIPS Western JCC, San Francisco, Calif., pp. 295–298 (1959)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
du Mouza, C., Rigaux, P., Scholl, M.: Parameterized pattern queries. Data Knowl. Eng. 63(2), 433–456 (2007)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. SIGMOD Rec. 23, 419–429 (1994)
Cho, J., Rajagopalan, S.: A fast regular expression indexing engine. In: International Conference on Data Engineering, pp. 419–430 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jain, R., Mohania, M.K., Prabhakar, S. (2013). Efficient Subsequence Search in Databases. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-38562-9_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38561-2
Online ISBN: 978-3-642-38562-9
eBook Packages: Computer ScienceComputer Science (R0)