Abstract
Finding a good pattern which discriminates one set of strings from the other set is a critical task in knowledge discovery. In this paper, we review a series of our works concerning with the string pattern discovery. It includes theoretical analyses of learnabilities of some pattern classes, as well as development of practical data structures which support efficient string processing.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the 11th International Conference on Data Engineering (March 1995)
Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1988)
Arimura, H., Arikawa, S., Shimozono, S.: Efficient discovery of optimal wordassociation patterns in large text databases. New Generation Computing 18, 49–60 (2000)
Arimura, H., Asaka, H., Sakamoto, H., Arikawa, S.: Efficient discovery of proximity patterns with suffix arrays (extended abstract). In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 152–156. Springer, Heidelberg (2001)
Baeza-Yates, R.A.: Searching subsequences. Theoretical Computer Science 78(2), 363–376 (1991)
Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of patterns. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 450–462. Springer, Heidelberg (2004) (to appear)
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: More speed and more pattern variations for knowledge discovery system BONSAI. Genome Informatics 12, 454–455 (2001)
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: A string pattern regression algorithm and its application to pattern discovery in long introns. Genome Informatics 13, 3–11 (2002)
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Occam’s razor. Inf. Process. Lett. 24, 377–380 (1987)
Blumer, A., Ehrenheucht, A., Haussler, D., Warmuth, M.: Learnability and the Vapnik-Chervonenkis dimension. Journal of ACM 36, 929–965 (1989)
Board, R., Pitt, L.: On the necessity of Occam algorithms. Theoretical Computer Science 100, 157–184 (1992)
Califano, A.: SPLASH: Structural pattern localization analysis by sequential histograms. Bioinformatics (February 1999)
Feldman, R., Aumann, Y., Amir, A., Zilberstein, A., Klosgen, W.: Maximal association rules: A new tool for mining for keyword co-occurrences in document collections. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, August 1997, pp. 167–170. AAAI Press, Menlo Park (1997)
Gold, E.: Language identification in the limit. Information and Control 10, 447–474 (1967)
Hirao, M., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best subsequence patterns. In: Morishita, S., Arikawa, S. (eds.) DS 2000. LNCS (LNAI), vol. 1967, pp. 141–154. Springer, Heidelberg (2000)
Hirao, M., Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: A practical algorithm to find the best episode patterns. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, p. 435. Springer, Heidelberg (2001)
Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: Online construction of subsequence automata for multiple texts. In: Proc. of 7th International Symposium on String Processing and Information Retrieval (SPIRE 2000), September 2000, pp. 146–152. IEEE Computer Society, Los Alamitos (2000)
Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of cooperative and competing patterns with bounded distance. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 32–46. Springer, Heidelberg (2004) (to appear)
Inenaga, S., Bannai, H., Shinohara, A., Takeda, M., Arikawa, S.: Discovering best variable-length-don’t-care patterns. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 86–97. Springer, Heidelberg (2002)
Inenaga, S., Takeda, M., Shinohara, A., Bannai, H., Arikawa, S.: Space-economical construction of index structures for all suffixes of a string. In: Diks, K., Rytter, W. (eds.) MFCS 2002. LNCS, vol. 2420, pp. 341–352. Springer, Heidelberg (2002)
Inenaga, S., Takeda, M., Shinohara, A., Hoshino, H., Arikawa, S.: The minimum dawg for all suffixes of a string and its applications. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 153–167. Springer, Heidelberg (2002)
Jiang, T., Li, M.: On the complexity of learning strings and sequences. In: Proc. of 4th ACM Conf. Computational Learning Theory, pp. 367–371. ACM Press, New York (1991)
Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)
Marsan, L., Sagot, M.-F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7, 345–360 (2000)
Matsumoto, S., Shinohara, A.: Learning subsequence languages. In: Kangassalo, H., et al. (eds.) Information Modeling and Knowledge Bases, VIII, pp. 335–344. IOS Press, Amsterdam (1997)
Miyano, S., Shinohara, A., Shinohara, T.: Which classes of elementary formal systems are polynomial-time learnable? In: Proc. 2nd Workshop on Algorithmic Learning Theory (ALT 1991), pp. 139–150 (1991)
Miyano, S., Shinohara, A., Shinohara, T.: Polynomial-time learning of elementary formal systems. New Generation Computing 18, 217–242 (2000)
Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proc. of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 2000, pp. 226–236. ACM Press, New York (2000)
Natarajan, B.: On learning sets and functions. Machine Learning 4(1), 67–97 (1989)
Palopoli, L., Terracina, G.: Discovering frequent structured patterns from string databases: an application to biological sequences. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 34–46. Springer, Heidelberg (2002)
Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., Arikawa, S.: Knowledge acquisition from amino acid sequences by machine learning system BONSAI. Transactions of Information Processing Society of Japan 35(10), 2009–2018 (1994)
Takeda, M., Inenaga, S., Bannai, H., Shinohara, A., Arikawa, S.: Discovering most classificatory patterns for very expressive pattern classes. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 486–493. Springer, Heidelberg (2003)
Troníček, Z.: Episode matching. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 143–146. Springer, Heidelberg (2001)
Troníček, Z., Shinohara, A.: The size of subsequence automaton. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 304–310. Springer, Heidelberg (2003)
Valiant, L.G.: A theory of the learnable. Communications of ACM 27, 1134–1142 (1984)
Wang, J.T.L., Chirn, G.-W., Marr, T.G., Shapiro, B.A., Shasha, D., Zhang, K.: Combinatorial pattern discovery for scientific data: Some preliminary results. In: Proc. of the 1994 ACM SIGMOD International Conference on Management of Data, May 1994, pp. 115–125. ACM Press, New York (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shinohara, A. (2004). String Pattern Discovery. In: Ben-David, S., Case, J., Maruoka, A. (eds) Algorithmic Learning Theory. ALT 2004. Lecture Notes in Computer Science(), vol 3244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30215-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-30215-5_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23356-5
Online ISBN: 978-3-540-30215-5
eBook Packages: Springer Book Archive