Abstract
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra– and inter–molecular similarities. Identifying intra–molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter–molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Abouelhoda, M., Ghanem, M. (2009). String Mining in Bioinformatics. In: Gaber, M. (eds) Scientific Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02788-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-02788-8_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02787-1
Online ISBN: 978-3-642-02788-8
eBook Packages: Computer ScienceComputer Science (R0)