Abstract
An important issue in data mining concerns the discovery of patterns presenting a user-speciffied minimum support. We generalize this problematics by introducing the concept of ambiguous event. An ambiguous event can be substituated for another without modifying the substance of the concerned pattern. For instance, in molecular biology, researchers attempt to identify conserved patterns in a family of proteins for which they know that they have evolved from a common ancestor. Such patterns are flexible in the sense that some residues may have been substituated for others during evolution. A[B C] is an example of notation of an ambiguous pattern representing the event A, followed by either the event B or C. A new scoring scheme is proposed for the computation of the frequency of ambiguous patterns, based on substitution matrices. A substitution matrix expresses the probability of the replacement of an event by another. We propose to adapt the Winepi algorithm [1] to ambiguous events. Finally, we give an application to the discovery of conserved patterns in a particular family of proteins, the cytokine receptors.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
H. Mannila, H. Toivonen, and AI Verkamo. Discovering frequent episodes in sequences. In First International Conference on Knowledge Discovery and Data Mining (KDD’95), pages 210–215. AAAI Press, August 1995.
R. Agrawal and R. Srikant. Mining sequential patterns. In 11th International Conference on Data Engineering, March 1995.
Ramakrishnan Srikant and Rakesh Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Peter M. G. Apers, Mokrane Bouzeghoub, and Georges Gardarin, editors, Proc. 5th Int. Conf. Extending Database Technology, EDBT, volume 1057 of Lecture Notes in Computer Science, LNCS, pages 3–17. Springer-Verlag, 25-29March 1996.
Heikki Mannila and Hannu Toivonen. Discovering generalized episodes using minimal occurrences. In 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96), pages 146–151. AAAI Press, August 1996.
S Henikoff and JG Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89:10915–10919, 1992.
Janice Glasgow, Igor Jurisica, and Raymond Ng. Data mining and knowledge discovery in databases. In Pacific Symposium on Biocomputing (PSB ’99), January 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gerard, R., Pascal, B., Yannick, J. (2000). Discovery of Ambiguous Patterns in Sequences Application to Bioinformatics. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_69
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_69
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive