Skip to main content
Log in

String Matching Over Compressed Text on Handheld Devices Using Tagged Sub-Optimal Code (TSC)

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

This paper presents Tagged Sub-optimal code (TSC), a new coding technique to speed up string matching over compressed databases on personal digital assistants (PDA). TSC is a variable-length sub-optimal code that supports minimal prefix property. It always determines its codeword boundary without traversing a tree or lookup table. TSC technique may be beneficial in many types of applications: speeding up string matching over compressed text, and speeding decoding process. This paper also presents two algorithms for string matching over compressed text using TSC (SCTT) and the Byte Pair Encoding (BPE) technique (SCTB). indent Several experiments were conducted to compare the performance of TSC, Byte Pair Encoding (BPE), and Huffman code. Several PDA databases with different record sizes were used: the well-known Calgary dataset and a set of small-sized PDA databases. Experimental results show that SCTT is almost twice as fast as the Huffman-based algorithm. SCTT has also the same performance in search time as the search in uncompressed databases and is faster than the SCTB algorithm. For frequently updated PDA databases such as phone books, to-do list, and memos, SCTT is the recommended method regardless of the size of the average record length, since the time required to compress the updated records using BPE poses significant delays compared to TSC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Al Rassan, I. 2004. String Matching Over Compressed Text on Handheld Devices. Doctoral Thesis, George Washington University, Washington DC.

    Google Scholar 

  • Amir, A. and Benson, G. 1992. Efficient two-dimensional compressed matching. In Proc. Second IEEE Data Compression Conference, pp. 279–288.

  • Amir, A., Benson, G., and Farach, M. 1996. Let sleeping files lie: Pattern Matching in Z-compressed files. Journal of Computer and System Science 52: 299–307.

    Article  Google Scholar 

  • Bellaachia, A., and Al Rassan, I. 2003. String matching over compressed text on handheld devices. In Proceeding of the International Conference on Embedded Systems and Applications ESA 03, pp. 80–86.

  • Bellaachia, A., and Al Rassan, I. 2004. Fast searching over compressed text using a new coding technique: Tagged sub-optimal code (TSC). In DCC 2004: IEEE Data Compression Conference, Snowbird, Utah.

    Google Scholar 

  • Bey, C., Freeman, E., and Ostrem, J. 2000. Palm OS® Programmer’s Companion, Palm Inc.

  • Baeza-Yates, R., and Gonnet, G. H. 1992. A new approach to text searching. Communications of the ACM 35(10): 74–82.

    Article  Google Scholar 

  • Boyer, R. S., and Moore, J. S. 1977. A fast string searching algorithm. Communications of the ACM 20(10): 62–72.

    Article  Google Scholar 

  • Farach, M., and Thorup, M. 1995. String-matching in lempel-ziv compressed strings. In 27th ACM STOC, pp. 703–713.

  • Forman, G. H., and Zahorjan, J. 1994. The challenges mobile of computing. Computer Science and Engineering, University of Washington.

  • Giguere, E. 1999. Palm Database Programming: The Complete Developer’s Guide. NewYork: Wiley.

    Google Scholar 

  • Gage, P. 1994. A new algorithm for data compression. The C Users Journal 12(2).

  • Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 1999. Shift-And approach to pattern matching in LZW compressed text. In 10th Ann. Symp. on Combinatorial Pattern Matching, Spring-Verlag, pp. 1–13.

  • Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., and Arikawa, S. 1998. Multiple pattern matching in LZW compressed text. In Data Compression Conference, IEEE Computer Society, pp. 103–112.

  • Klein, S. T., and Shapira, D. 2001. Pattern matching in Huffman encoded text. In IEEE Computer, Data Compression Conference, pp. 449–458.

  • Larsson, N. J. 1999. Structure of string matching and data compression. PhD thesis, Department of Computer Science, Lund University.

  • Larsson, N. J., and Moffat, A. 1999. Offline Dictionary-Based Compression. In Proc. Data Compression Conference (DCC’99), IEEE Computer Society, pp. 296–305.

  • Liddell and Moffat. 2003. Hybrid prefix codes for practical use. In Proc. IEEE Data Compression Conference, Snowbird, Utah, pp. 392–401.

  • Manber, U. 1994. A text compression scheme that allow fast searching directly in the compressed file. Combinatorial Pattern Matching, Spring-Verlag, pp. 113–124.

  • Maxwell, G. 1999. Teach Yourself Palm Programming in 24 Hours. Sams Publishing.

  • Mitarai, S., Hirao, M., Matsumoto, T., Shinohara, A., Takeda, M., and Arikawa, S. 2001. Compressed pattern matching for sequitur, Data Compression Conference.

  • Navarro, G. 2001 Regular expression searching over Ziv-Lempel compressed text. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching.

  • Navarro, G., Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 2001. Faster approximate string matching over compressed text, DCC, pp. 459–467.

  • Navarro, G., and Raffinot, M. 1998. A general practical approach to pattern matching over ziv-lempel compressed text. In 10th Annual Symposium on Combinatorial Pattern Matching.

  • Rhode, N., and Mckeehan, J. 1998. Palm Programming: The Developer’s Guide. O’Reilly 1st edition.

  • Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., and Shinohara, T. 1999. A unifying framework for compressed pattern matching, SPIRE/CRIWG.

  • Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., and Shinohara, T. 2000. Speeding up pattern matching by text compression. In CIAC.

  • Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., and Arikawa, S. 1999. Byte pair encoding: A text compression scheme that accelerates pattern matching. Technical Report DOI-TR-CS-161, Department of Informatics, Kyushu University.

  • Shibata, Y., Matsumoto, T., Takeda, M., Shinohara, A., and Arikawa, S. 2000. A Boyer-Moore type algorithm for compressed pattern matching. S. Comb. Pattern Matching, Spring-Verlag, pp. 181–194.

  • Varadarajan, S., and Chiueh, T. 1997. SASE: Implementation of a compressed text search engine. In Usenix Symposium on Internet Technologies and Systems.

  • Wu, S., and Manber, U. 1992. Fast text searching Allowing Errors. Communications of the ACM 35(10): 83–91.

    Article  Google Scholar 

  • Ziviani, N., de Moura, E., Navarro, G., and Baeza-Yates, R. 2000. Compression: A key for next-generation text retrieval systems. IEEE Computer 33(11): 37–44.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelghani Bellaachia.

Additional information

Abdeghani Bellaachia is an associate professor at the Computer Science Department, George Washington University. He received his Diploma of Engineering from Mohammadia School of Engineering in Rabat, Morocco, in 1983, the MS and Doctoral of Science degrees from the George Washington University in 1992. He was the chief architect of the Arabization of the Palm-OS platform. His research interests include data mining, multi-lingual information retrieval systems, cross-language retrieval systems, database management systems, bio-informatics, design and analysis of algorithms, handheld computing, and parallel processing.

Iehab AL Rassan works for Ministry of higher education in Saudi Arabia, director of information technology department. He received his B.A. in Computer Information Systems from King Faisal University. He then received his M.S. in Computer Science from Fairleigh Dickinson University and his Doctor of Science in Computer Science from the George Washington University. His research interests include coding theories, information retrieval, string-matching algorithms, data compression, and handheld computing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bellaachia, A., AL Rassan, I. String Matching Over Compressed Text on Handheld Devices Using Tagged Sub-Optimal Code (TSC). Real-Time Syst 29, 227–246 (2005). https://doi.org/10.1007/s11241-005-6886-9

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-005-6886-9

Keywords

Navigation