String Pattern Discovery

Shinohara, Ayumi

doi:10.1007/978-3-540-30215-5_1

String Pattern Discovery

Ayumi Shinohara²¹

Conference paper

495 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3244))

Abstract

Finding a good pattern which discriminates one set of strings from the other set is a critical task in knowledge discovery. In this paper, we review a series of our works concerning with the string pattern discovery. It includes theoretical analyses of learnabilities of some pattern classes, as well as development of practical data structures which support efficient string processing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the 11th International Conference on Data Engineering (March 1995)
Google Scholar
Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1988)
Google Scholar
Arimura, H., Arikawa, S., Shimozono, S.: Efficient discovery of optimal wordassociation patterns in large text databases. New Generation Computing 18, 49–60 (2000)
Article Google Scholar
Arimura, H., Asaka, H., Sakamoto, H., Arikawa, S.: Efficient discovery of proximity patterns with suffix arrays (extended abstract). In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 152–156. Springer, Heidelberg (2001)
Chapter Google Scholar
Baeza-Yates, R.A.: Searching subsequences. Theoretical Computer Science 78(2), 363–376 (1991)
Article MATH MathSciNet Google Scholar
Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of patterns. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 450–462. Springer, Heidelberg (2004) (to appear)
Chapter Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: More speed and more pattern variations for knowledge discovery system BONSAI. Genome Informatics 12, 454–455 (2001)
Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: A string pattern regression algorithm and its application to pattern discovery in long introns. Genome Informatics 13, 3–11 (2002)
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Occam’s razor. Inf. Process. Lett. 24, 377–380 (1987)
Article MATH MathSciNet Google Scholar
Blumer, A., Ehrenheucht, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis dimension. Journal of ACM 36, 929–965 (1989)
Article MATH Google Scholar
Board, R., Pitt, L.: On the necessity of Occam algorithms. Theoretical Computer Science 100, 157–184 (1992)
Article MATH MathSciNet Google Scholar
Califano, A.: SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics (February 1999)
Google Scholar
Feldman, R., Aumann, Y., Amir, A., Zilberstein, A., Klosgen, W.: Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, August 1997, pp. 167–170. AAAI Press, Menlo Park (1997)
Google Scholar
Gold, E.: Language identification in the limit. Information and Control 10, 447–474 (1967)
Article MATH Google Scholar
Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best subsequence patterns. In: Morishita, S., Arikawa, S. (eds.) DS 2000. LNCS (LNAI), vol. 1967, pp. 141–154. Springer, Heidelberg (2000)
Chapter Google Scholar
Hirao, M., Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best episode patterns. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, p. 435. Springer, Heidelberg (2001)
Chapter Google Scholar
Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: Online construction of subsequence automata for multiple texts. In: Proc. of 7th International Symposium on String Processing and Information Retrieval (SPIRE 2000), September 2000, pp. 146–152. IEEE Computer Society, Los Alamitos (2000)
Chapter Google Scholar
Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of cooperative and competing patterns with bounded distance. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 32–46. Springer, Heidelberg (2004) (to appear)
Chapter Google Scholar
Inenaga, S., Bannai, H., Shinohara, A., Takeda, M., Arikawa, S.: Discovering best variable-length-don’t-care patterns. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 86–97. Springer, Heidelberg (2002)
Chapter Google Scholar
Inenaga, S., Takeda, M., Shinohara, A., Bannai, H., Arikawa, S.: Space-economical construction of index structures for all suffixes of a string. In: Diks, K., Rytter, W. (eds.) MFCS 2002. LNCS, vol. 2420, pp. 341–352. Springer, Heidelberg (2002)
Chapter Google Scholar
Inenaga, S., Takeda, M., Shinohara, A., Hoshino, H., Arikawa, S.: The minimum dawg for all suffixes of a string and its applications. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 153–167. Springer, Heidelberg (2002)
Chapter Google Scholar
Jiang, T., Li, M.: On the complexity of learning strings and sequences. In: Proc. of 4th ACM Conf. Computational Learning Theory, pp. 367–371. ACM Press, New York (1991)
Google Scholar
Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)
Google Scholar
Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–360 (2000)
Article Google Scholar
Matsumoto, S., Shinohara, A.: Learning subsequence languages. In: Kangassalo, H., et al. (eds.) Information Modeling and Knowledge Bases, VIII, pp. 335–344. IOS Press, Amsterdam (1997)
Google Scholar
Miyano, S., Shinohara, A., Shinohara, T.: Which classes of elementary formal systems are polynomial-time learnable? In: Proc. 2nd Workshop on Algorithmic Learning Theory (ALT 1991), pp. 139–150 (1991)
Google Scholar
Miyano, S., Shinohara, A., Shinohara, T.: Polynomial-time learning of elementary formal systems. New Generation Computing 18, 217–242 (2000)
Article Google Scholar
Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proc. of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 2000, pp. 226–236. ACM Press, New York (2000)
Google Scholar
Natarajan, B.: On learning sets and functions. Machine Learning 4(1), 67–97 (1989)
Google Scholar
Palopoli, L., Terracina, G.: Discovering frequent structured patterns from string databases: an application to biological sequences. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 34–46. Springer, Heidelberg (2002)
Chapter Google Scholar
Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., Arikawa, S.: Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan 35(10), 2009–2018 (1994)
Google Scholar
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering most classificatory patterns for very expressive pattern classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Chapter Google Scholar
Troníček, Z.: Episode matching. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 143–146. Springer, Heidelberg (2001)
Chapter Google Scholar
Troníček, Z., Shinohara, A.: The size of subsequence automaton. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 304–310. Springer, Heidelberg (2003)
Chapter Google Scholar
Valiant, L.G.: A theory of the learnable. Communications of ACM 27, 1134–1142 (1984)
Article MATH Google Scholar
Wang, J.T.L., Chirn, G.-W., Marr, T.G., Shapiro, B.A., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: Some preliminary results. In: Proc. of the 1994 ACM SIGMOD International Conference on Management of Data, May 1994, pp. 115–125. ACM Press, New York (1994)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, PRESTO, Japan Science and Technology Agency, Kyushu University 33, Fukuoka, 812-8581, JAPAN
Ayumi Shinohara

Authors

Ayumi Shinohara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

David R. Cheriton School of Computer Science University of Waterloo,
Shoham Ben-David
Department of Computer & Information Sciences, University of Delaware, 103 Smith Hall, DE 19716, Newark
John Case
Dept. of Information Technology and Electronics, Ishinomaki Senshu University,
Akira Maruoka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shinohara, A. (2004). String Pattern Discovery. In: Ben-David, S., Case, J., Maruoka, A. (eds) Algorithmic Learning Theory. ALT 2004. Lecture Notes in Computer Science(), vol 3244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30215-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-30215-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23356-5
Online ISBN: 978-3-540-30215-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics