String Mining in Bioinformatics

Abouelhoda, Mohamed; Ghanem, Moustafa

doi:10.1007/978-3-642-02788-8_9

Mohamed Abouelhoda^2,3 &
Moustafa Ghanem

3242 Accesses
3 Altmetric

Abstract

Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra– and inter–molecular similarities. Identifying intra–molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter–molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Popping Superbubbles and Discovering Clumps: Recent Developments in Biological Sequence Analysis

Bioinformatics Analysis of Sequence Data

Quantiprot - a Python package for quantitative analysis of protein sequences

Article Open access 17 July 2017

Author information

Authors and Affiliations

Cairo University, Orman, Gamaa Street, 12613 Al Jizah, Giza, Egypt
Mohamed Abouelhoda
Nile University, Cairo-Alex Desert Rd, Cairo, 12677, Egypt
Mohamed Abouelhoda

Authors

Mohamed Abouelhoda
View author publications
You can also search for this author in PubMed Google Scholar
Moustafa Ghanem
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Caulfield School of, Information Technology, Monash University, Caulfield East, Australia
Mohamed Medhat Gaber

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Abouelhoda, M., Ghanem, M. (2009). String Mining in Bioinformatics. In: Gaber, M. (eds) Scientific Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02788-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-02788-8_9
Published: 31 July 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02787-1
Online ISBN: 978-3-642-02788-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics