Abstract
This study explores an alternative way of storing text files in a difierent format that will speed up the searching process. The input file is decomposed into two parts as filter and payload. Filter part is composed of most informative k-bits of each byte from the original file. Remaining bits form the payload. Selection of the most informative bits are achieved according to their entropy. When an input pattern is to be searched on the new file structure, same decomposition is performed on the pattern. The filter part of the pattern is queried in the filter part of the file following by a verification process of the payload for the matching positions. Experiments conducted on natural language texts, plain ascii DNA sequences, and random byte sequences showed that the search performance with the proposed scheme is on the average two times faster than the tested exact pattern matching algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apostolico, A., Galil, Z., eds.: Pattern Matching Algorithms. Oxford University Press (1997)
Charras, C., Lecroq, T.: Handbook of exact string matching algorithms. King’s Collage Publications (2004)
Crochemore, M., Rytter, W.: Jewels of stringology. World Scientific Publishing (2003)
Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing 35 (2005) 378–407
Wu, S., Manber, U.: Agrep - a fast approximate pattern-matching tool. In: USENIX Winter 1992 Technical Conference. (1992) 153–162
Lecroq, T.: Fast exact string matching algorithms. Information Processing Letters 102 (2007) 229–235
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal (1948)
Külekci, M.O.: A method to overcome computer word size limitation in bit-parallel pattern matching. In: Proceedings of ISAAC’2008. Volume 5369 of Lecture Notes in Computer Science., Gold Coast, Australia, Springer Verlag (2008) 496–506
Klein, S.T., Ben-Nissan, M.: Accelerating boyer moore searches on binary texts. In: Proceedings of CIAA. Volume 4783 of LNCS., Springer Verlag (2007) 130–143
Kim, J., Kim, E., Park, K.: Fast matching method for dna sequences. In: Proceedings of Combinatorics, Algorithms, Probablistic and Experimental Methodologies. Volume 4614 of LNCS., Springer Verlag (2007) 271–281
Faro, S., Lecroq, T.: Efficient pattern matching on binary strings. In: Current Trends in Theory and Practice of Computer Science. (2009) Poster.
Faro, S., Lecroq, T.: An efficient matching algorithm for encoded dna sequences and binary strings. In: Proceedings of CPM’09. LNCS (2009)
Boyer, R., Moore, J.: A fast string searching algorithm. Communications of the ACM 20 (1977) 762–772
Sunday, D.: A very fast substring search algorithm. Communications of the ACM 33 (1990) 132–142
Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: A new structure for pattern matching. In: Proceedings of SOFSEM’99. Volume 1725 of LNCS., Springer Verlag (1999) 291–306
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this paper
Cite this paper
Külekci, M.O., Vitter, J.S., Xu, B. (2011). Boosting Pattern Matching Performance via k-bit Filtering. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9794-1_6
Download citation
DOI: https://doi.org/10.1007/978-90-481-9794-1_6
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9793-4
Online ISBN: 978-90-481-9794-1
eBook Packages: EngineeringEngineering (R0)