Skip to main content

String Pattern Discovery

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3244))

Abstract

Finding a good pattern which discriminates one set of strings from the other set is a critical task in knowledge discovery. In this paper, we review a series of our works concerning with the string pattern discovery. It includes theoretical analyses of learnabilities of some pattern classes, as well as development of practical data structures which support efficient string processing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the 11th International Conference on Data Engineering (March 1995)

    Google Scholar 

  2. Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1988)

    Google Scholar 

  3. Arimura, H., Arikawa, S., Shimozono, S.: Efficient discovery of optimal wordassociation patterns in large text databases. New Generation Computing 18, 49–60 (2000)

    Article  Google Scholar 

  4. Arimura, H., Asaka, H., Sakamoto, H., Arikawa, S.: Efficient discovery of proximity patterns with suffix arrays (extended abstract). In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 152–156. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Baeza-Yates, R.A.: Searching subsequences. Theoretical Computer Science 78(2), 363–376 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of patterns. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 450–462. Springer, Heidelberg (2004) (to appear)

    Chapter  Google Scholar 

  7. Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: More speed and more pattern variations for knowledge discovery system BONSAI. Genome Informatics 12, 454–455 (2001)

    Google Scholar 

  8. Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: A string pattern regression algorithm and its application to pattern discovery in long introns. Genome Informatics 13, 3–11 (2002)

    Google Scholar 

  9. Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Occam’s razor. Inf. Process. Lett. 24, 377–380 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  10. Blumer, A., Ehrenheucht, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis dimension. Journal of ACM 36, 929–965 (1989)

    Article  MATH  Google Scholar 

  11. Board, R., Pitt, L.: On the necessity of Occam algorithms. Theoretical Computer Science 100, 157–184 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  12. Califano, A.: SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics (February 1999)

    Google Scholar 

  13. Feldman, R., Aumann, Y., Amir, A., Zilberstein, A., Klosgen, W.: Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, August 1997, pp. 167–170. AAAI Press, Menlo Park (1997)

    Google Scholar 

  14. Gold, E.: Language identification in the limit. Information and Control 10, 447–474 (1967)

    Article  MATH  Google Scholar 

  15. Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best subsequence patterns. In: Morishita, S., Arikawa, S. (eds.) DS 2000. LNCS (LNAI), vol. 1967, pp. 141–154. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  16. Hirao, M., Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best episode patterns. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, p. 435. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  17. Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: Online construction of subsequence automata for multiple texts. In: Proc. of 7th International Symposium on String Processing and Information Retrieval (SPIRE 2000), September 2000, pp. 146–152. IEEE Computer Society, Los Alamitos (2000)

    Chapter  Google Scholar 

  18. Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of cooperative and competing patterns with bounded distance. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 32–46. Springer, Heidelberg (2004) (to appear)

    Chapter  Google Scholar 

  19. Inenaga, S., Bannai, H., Shinohara, A., Takeda, M., Arikawa, S.: Discovering best variable-length-don’t-care patterns. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 86–97. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Inenaga, S., Takeda, M., Shinohara, A., Bannai, H., Arikawa, S.: Space-economical construction of index structures for all suffixes of a string. In: Diks, K., Rytter, W. (eds.) MFCS 2002. LNCS, vol. 2420, pp. 341–352. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  21. Inenaga, S., Takeda, M., Shinohara, A., Hoshino, H., Arikawa, S.: The minimum dawg for all suffixes of a string and its applications. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 153–167. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Jiang, T., Li, M.: On the complexity of learning strings and sequences. In: Proc. of 4th ACM Conf. Computational Learning Theory, pp. 367–371. ACM Press, New York (1991)

    Google Scholar 

  23. Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)

    Google Scholar 

  24. Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–360 (2000)

    Article  Google Scholar 

  25. Matsumoto, S., Shinohara, A.: Learning subsequence languages. In: Kangassalo, H., et al. (eds.) Information Modeling and Knowledge Bases, VIII, pp. 335–344. IOS Press, Amsterdam (1997)

    Google Scholar 

  26. Miyano, S., Shinohara, A., Shinohara, T.: Which classes of elementary formal systems are polynomial-time learnable? In: Proc. 2nd Workshop on Algorithmic Learning Theory (ALT 1991), pp. 139–150 (1991)

    Google Scholar 

  27. Miyano, S., Shinohara, A., Shinohara, T.: Polynomial-time learning of elementary formal systems. New Generation Computing 18, 217–242 (2000)

    Article  Google Scholar 

  28. Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proc. of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 2000, pp. 226–236. ACM Press, New York (2000)

    Google Scholar 

  29. Natarajan, B.: On learning sets and functions. Machine Learning 4(1), 67–97 (1989)

    Google Scholar 

  30. Palopoli, L., Terracina, G.: Discovering frequent structured patterns from string databases: an application to biological sequences. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 34–46. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  31. Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., Arikawa, S.: Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan 35(10), 2009–2018 (1994)

    Google Scholar 

  32. Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering most classificatory patterns for very expressive pattern classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  33. Troníček, Z.: Episode matching. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 143–146. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  34. Troníček, Z., Shinohara, A.: The size of subsequence automaton. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 304–310. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  35. Valiant, L.G.: A theory of the learnable. Communications of ACM 27, 1134–1142 (1984)

    Article  MATH  Google Scholar 

  36. Wang, J.T.L., Chirn, G.-W., Marr, T.G., Shapiro, B.A., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: Some preliminary results. In: Proc. of the 1994 ACM SIGMOD International Conference on Management of Data, May 1994, pp. 115–125. ACM Press, New York (1994)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shinohara, A. (2004). String Pattern Discovery. In: Ben-David, S., Case, J., Maruoka, A. (eds) Algorithmic Learning Theory. ALT 2004. Lecture Notes in Computer Science(), vol 3244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30215-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30215-5_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23356-5

  • Online ISBN: 978-3-540-30215-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics