Abstract
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra– and inter–molecular similarities. Identifying intra–molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter–molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Abouelhoda, M., Ghanem, M. (2009). String Mining in Bioinformatics. In: Gaber, M. (eds) Scientific Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02788-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-02788-8_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02787-1
Online ISBN: 978-3-642-02788-8
eBook Packages: Computer ScienceComputer Science (R0)