Data Mining for Motifs in DNA Sequences

Bell, D. A.; Guan, J. W.

doi:10.1007/3-540-39205-X_85

D. A. Bell⁵ &
J. W. Guan^5,6

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2639))

Included in the following conference series:

International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing

589 Accesses

Abstract

In the large collections of genomic information accumulated in recent years there is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. One interesting approach to the distillation of such knowledge is to detect strings in DNA sequences which are very repetitive within a given sequence (eg for a particular patient) or across sequences (eg from different patients who have been classified in some way eg as sharing a particular medical diagnosis). Motifs are strings that occur relatively frequently.

In this paper we present basic theory and algorithms for finding such frequent and common strings. We are particularly interested in strings which are maximally frequent and, having discovered very frequent motifs we show how to mine association rules by an existing rough sets based technique. Further work and applications are in process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bell, D.A.; Guan, J. W. (1998). “Computational methods for rough classification and discovery”, Journal of the American Society for Information Science, Special Topic Issue on Data Mining, Vol. 49(1998), No. 5, 403–414.
Google Scholar
Feldman, R.; Aumann, Y.; Amir, A.; Zilberstain, A.; Kloesgen, W. Ben-Yehuda, Y. 1997, Maximal association rules: a new tool for mining for keyword cooccurrences in document collection, in Proceedings of the 3rd International Conference on Knowledge Discovery (KDD 1997), 167–170.
Google Scholar
Feldman, R.; Aumann, Y.; Zilberstain, A.; Ben-Yehuda, Y. 1998, Trend graphs: visualizing the evolution of concept relationships in large document collection, in Proceedings of the 2nd European Symposium on Knowledge Discovery in Databases, PKDD’98, Nantes, France, 23–26 September 1998; Lecture Notes in Artificial Intelligence 1510: Principles of Data Mining and Knowledge Discovery, Jan M. Zytkow Mohamed Quafafou eds.; Springer, 38–46.
Google Scholar
Feldman, R.; Fresko, M.; Kinar, Y.; Lindell, Y.; Liphstat, O.; Rajman, M.; Schler, Y.; Zamir, O. 1998, Text mining at the term level, in Proceedings of the 2nd European Symposium on Knowledge Discovery in Databases, PKDD’98, Nantes, France, 23–26 September 1998; Lecture Notes in Artificial Intelligence 1510: Principles of Data Mining and Knowledge Discovery, Jan M. Zytkow Mohamed Quafafou eds.; Springer, 65–73.
Google Scholar
Frawley, W.J., Piatetsky-Shapiro, G., & Matheus, C.J. (1991). Knowledge discovery in databases: an overview. In G. Piatetsky-Shapiro, W.J. Frawley (eds). Knowledge Discovery in Databases (pp. 1–27). AAAI/MIT Press.
Google Scholar
Guan, J.W.; Bell, D. A. (1998), “Rough computational methods for information systems”, Artificial Intelligence — An International Journal, Vol. 105(1998), 77–104.
MATH Google Scholar
Kiem, H.; Phuc, D. 2000, “Discovering motif based association rules in a set of DNA sequences”, in W. Ziarko & Y. Yao (ed.) Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing (RSCTC’2000), Banff, Canada, October 16–19, 2000; 348–352. ISBN 0828-3494, ISBN 0-7731-0413-5
Google Scholar
Landau, D.; Feldman, R.; Aumann, Y.; Fresko, M.; Lindell, Y.; Liphstat, O.; Zamir, O. 1998, Text Vis: an integrated visual environment for text mining, in Proceedings of the 2nd European Symposium on Knowledge Discovery in Databases, PKDD’98, Nantes, France, 23–26 September 1998; Lecture Notes in Artificial Intelligence 1510: Principles of Data Mining and Knowledge Discovery, Jan M. Zytkow Mohamed Quafafou eds.; Springer, 56–64.
Google Scholar
Pawlak, Z. (1991). Rough sets: theoretical aspects of reasoning about data. Kluwer.
Google Scholar
Srikant, R.; Agrawal, R. 1995–1996, Mining sequential patterns: generalizations and performance improvements, in Proceedings of the Fifth International Conference on Extending Database Technology (EDBT), Avignon, France, March 1996; IBM Research Report RJ 9994, December 1995 (expanded version).
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, The Queen’s University of Belfast, BT7 1NN, Northern Ireland, UK
D. A. Bell & J. W. Guan
College of Computer Science and Technology, Jilin University, 130012, Changchun, P.R.CHINA
J. W. Guan

Authors

D. A. Bell
View author publications
You can also search for this author in PubMed Google Scholar
J. W. Guan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, P.R. China
Guoyin Wang
Department of Computer Science, Nanchang University, Nanchang, 330029, P.R. China
Qing Liu
Department of Computer Science, University of Regina, Regina, Saskatchewan, S4S 0A2, Canada
Yiyu Yao
Institute of Mathematics, Warsaw University, Banacha 2, 02-097, Warsaw, Poland
Andrzej Skowron

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bell, D.A., Guan, J.W. (2003). Data Mining for Motifs in DNA Sequences. In: Wang, G., Liu, Q., Yao, Y., Skowron, A. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. RSFDGrC 2003. Lecture Notes in Computer Science(), vol 2639. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39205-X_85

Download citation

DOI: https://doi.org/10.1007/3-540-39205-X_85
Published: 30 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-14040-5
Online ISBN: 978-3-540-39205-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics