Modified PrefixSpan Method for Motif Discovery in Sequence Databases

Kitakami, Hajime; Kanbara, Tomoki; Mori, Yasuma; Kuroki, Susumu; Yamazaki, Yukiko

doi:10.1007/3-540-45683-X_52

Hajime Kitakami³,
Tomoki Kanbara³,
Yasuma Mori³,
Susumu Kuroki³ &
…
Yukiko Yamazaki⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2417))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

Abstract

We propose a motif discovery system that uses a modified PrefixSpan method to extract frequent patterns from an annotated sequence database that has such attributes as a sequence identifier (sequence-id), a sequence, and a set of items. The annotations are represented as the set of items in the database. Frequent sequence patterns and frequent item patterns are extracted from the annotated sequence database. Frequent sequence patterns are located in both identical and non-identical positions among those sequences. In general, the existing PrefixSpan method can extract a large number of identical patterns from the sequence databases. However, the method does not include a function to extract frequent patterns together with gaps or wild character symbols. This new method allows the incorporation of gap characters. Moreover, the method allows effective handling of the annotated sequence database that consists of a set of tuples including a sequence together with a set of items. Furthermore, the prototype has been applied to the evaluation of three sets of sequences that include the Zinc Finger, Cytochrome C, and Kringle motifs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. 1995 International Conference on Data engineering (ICDE’95). (1995) 3–14
Google Scholar
Sirikant, R., Agrawal, R.: Mining Sequential Patterns: Generation and Performance Improvements. In: Proc. 5th International Conference on Extending Database Technology (EDB’96). (1996) 3–17
Google Scholar
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.V.: Free Span: Frequent Pattern-Projected Sequential Pattern Mining. In: Proc. of International Conference on Knowledge Discovery and Data Mining (KDD 2000). (2000) 355–359
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proc. of 7th International Conference on Data Engineering (ICDE2001), IEEE Computer Society Press (2001) 215–224
Google Scholar
Waterman, M.S., Smith, T.F., Beyer, W.A.: Some Biological Sequence Metrics. Adv. Math 20 (1976) 367–387
Article MATH MathSciNet Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press (1997)
Google Scholar
Eskin, E., Pevzner, P.: Finding Composite Regulatory Patterns in DNA Sequences. In: Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB-2002). (2002)
Google Scholar
Kanbara, T., Mori, Y., Kitakami, H., Kuroki, S., Yamazaki, Y.: Discovering Motifs in Amino Acid Sequences using a Modified PrefixSpan Method. Currents in Computational Molecular Biology 2002, Washington DC (2002) 96–97
Google Scholar
Fayyard, U.M., et al., eds.: Advances in Knowledge Discovery and Data Mining. AAAI Press / MIT Press (1996)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent Pattern without Candidate Generation. In: Proc. ACM SIGMOD. (2000) 1–12
Google Scholar
Ryu, H., Kitakami, H., Mori, Y., Kuroki, S.: Applying Set of Answers collected from Questionnaire to Generation of Association Rules using Trie Structure. In: Proceedings of 2nd Data Mining Workshop (in Japanese). Number 16, ISSN 1341-870X, Japan Society for Software Science and Technology (2001) 11–19
Google Scholar
SYBASE: Transact-SQL Userś Guide, Sybase Inc. (1989).
Google Scholar
DDBJ: http://www.ddbj.nig.ac.jp/Welcome.html.
PROSITE: http://au.expasy.org/prosite/.

Download references

Author information

Authors and Affiliations

Hiroshima City University, 3-4-1 Ozuka-Higashi, Asa-Minami-Ku, Hiroshima, 731-3194, Japan
Hajime Kitakami, Tomoki Kanbara, Yasuma Mori & Susumu Kuroki
Center for Genetic Resource Information, National Institute of Genetics, 1111 Yata, Mishima-Shi, Shizuoka-Ken, 411, Japan
Yukiko Yamazaki

Authors

Hajime Kitakami
View author publications
You can also search for this author in PubMed Google Scholar
Tomoki Kanbara
View author publications
You can also search for this author in PubMed Google Scholar
Yasuma Mori
View author publications
You can also search for this author in PubMed Google Scholar
Susumu Kuroki
View author publications
You can also search for this author in PubMed Google Scholar
Yukiko Yamazaki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Science and Technology Department of Information and Communication Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Mitsuru Ishizuka
School of Information Technology Knowledge Representation and Reasoning Unit (KRRU) Faculty of Engineering and Information Technology, Griffith University, PMB 50 Gold Coast Mail Centre, Queensland, 9726, Australia
Abdul Sattar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kitakami, H., Kanbara, T., Mori, Y., Kuroki, S., Yamazaki, Y. (2002). Modified PrefixSpan Method for Motif Discovery in Sequence Databases. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_52

Download citation

DOI: https://doi.org/10.1007/3-540-45683-X_52
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44038-3
Online ISBN: 978-3-540-45683-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics