Skip to main content

Prediction of E.Coli Promoter Gene Sequences Using a Hybrid Combination Based on Feature Selection, Fuzzy Weighted Pre-processing, and Decision Tree Classifier

  • Conference paper
Knowledge-Based Intelligent Information and Engineering Systems (KES 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4692))

Abstract

In this paper, we have investigated the real-world task of recognizing biological concepts in DNA sequences. Recognizing promoters in strings that represent nucleotides (one of A, G, T, or C) has been performed using a hybrid approach based on combining feature selection (FS), fuzzy weighted pre-processing, and C4.5 decision tree classifier (DCS). Dimensionality of E.coli Promoter Gene Sequences dataset has 57 attributes and 106 samples including 53 promoters and 53 non-promoters. The proposed approach consists of three stages. Firstly, we have used the FS process to reduce the dimensionality of E.coli Promoter Gene Sequences dataset that has 57 attributes. So the dimensionality of this dataset has been reduced to 4 attributes by means of FS process. Secondly, fuzzy weighted pre-processing has been used to weight E.coli Promoter Gene Sequences dataset that has 4 attributes in interval of [0,1]. Finally, C4.5 decision tree classifier algorithm has been run to estimation the E.coli Promoter Gene Sequences. In order to show the performance of the proposed system, we have used the predicton accuracy and 10-fold cross validation. 93.33% classification accuracy has been obtained by the proposed system using 10-fold cross validation. This success shows that the proposed system is a robust and effective system in the prediction of E.coli Promoter Gene Sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Harley, C., Reynolds, R.: Analysis of E.coli promoters sequences. Nucl. Acids Res. 15, 2343–2361 (1987)

    Article  Google Scholar 

  2. Geoffrey, G.T., Jude, W.S., Michiel, O.N.: Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks. In: Proc. of the eight national conf. on Artificial Intelligence, pp. 861–866 (1990)

    Google Scholar 

  3. UCI Machine Learning Repository (last arrived: January, 2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  4. Polat, K., Şahan, S., Kodaz, H., Günes, S.: A New Classification Method for Breast Cancer Diagnosis: Feature Selection Artificial Immune Recognition System. In: Wang, L., Chen, K., Ong, Y.S. (eds.) ICNC 2005. LNCS, vol. 3611, pp. 830–838. Springer, Heidelberg (2005)

    Google Scholar 

  5. Polat, K., Şahan, S., Güneş, S.: A New Method to Medical Diagnosis: Artificial Immune Recognition System (AIRS) with Fuzzy Weighted Pre-processing and Application to ECG Arrhythmia. Expert Systems with Applications 31(2), 264–269 (2006)

    Article  Google Scholar 

  6. Mitchell, M.T.: Machine Learning. McGraw-Hill, Singapore (1997)

    MATH  Google Scholar 

  7. Quinlan, J.R.: Induction of decision trees, Machine Learning, vol. 1, pp. 81–106 (1986)

    Google Scholar 

  8. Ranawana, R., Palade, V.: A Neural Network Based Multi-Classifier System for Gene Identification in DNA Sequences. Neural Computing and Applications 14(2), 122–131 (2005)

    Article  Google Scholar 

  9. Mahadevan, I., Ghosh, I.: Analysis of E.coli promoter structures using neural networks. Nucleic Acids. Res. 22(11), 2158–2165 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bruno Apolloni Robert J. Howlett Lakhmi Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akdemir, B., Polat, K., Güneş, S. (2007). Prediction of E.Coli Promoter Gene Sequences Using a Hybrid Combination Based on Feature Selection, Fuzzy Weighted Pre-processing, and Decision Tree Classifier. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74819-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74819-9_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74817-5

  • Online ISBN: 978-3-540-74819-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics